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CHATTER HH 


THE ARS TEMES: OF SUMMARISATION 


11.00 INTRODUCTION 


In VoL. I our main concern has been with problems involving comparison of scores of not 
more than 2 samples from the same universe or from identical universes. When this is the 
end in view, the score may be: (a) the mean or total value of a count (e.g. income) or measure- 
ment (e.g. height) attached to each individual of the group ; (b) the proportion or number of 
individuals with a particular attribute. Whenever we have so far used the method of scoring 
last mentioned, we have done so on the understanding that the constituent individuals of 
the group are assignable to one of two classes A or not-A. We have not examined the 
consequences of any null hypothesis when: (a) our definition of sample structure entails a 
specification of individuals assignable to more than 2 classes ; (b) our comparison of samples 
involves more than two of them atatime. Such are the issues we shall chiefly explore in this 
volume. 

In doing so, we shall acquaint ourselves with different statistical techniques which have much 
in common, though their uses may be greatly different. What is common ground may be 
referable to the logic of classifying and summarising the relevant data or to the algebra of the 
sampling distributions invoked as a basis for significance tests. For that reason, we shall not 
attempt to deal comprehensively with any one of them as a separate entity in the chapters which 
follow; and a brief summary of the lay-out of this volume may therefore be helpful to the 
beginner who wishes to make the best use of it. 

The only theoretical sampling distribution we have so far employed as a basis for a signifi- 
cance test is the normal ; but we have seen that the reasons for relying on the normal distribution 
in one context may be quite different from the reasons for relying on it in another. For instance, 
different reasons justify our belief that the normal curve gives in practice a good enough 
description for specifying the frequency of different possible values of: (a) the mean score of 100 
tosses of a die; (b) the difference between the proportion of hearts in samples of 50 and 75 cards 
taken from a full pack with replacement of each card drawn before extraction of another. Similar 
remarks apply to the use of the sampling distributions invoked by the statistical techniques 
dealt with in this volume. Thus the reasons for performing a Chi-Square Test, i.e. a test based 
on a special form of Pearson’s Type III, or the reasons for performing a t-Test, i.e. a test based 
on a special form of Pearson’s Type VII, will depend on the character of the statistic whose 
sampling distribution is under consideration ; but we are in a position to grasp what the reasons 
are only if we have some acquaintance with the algebraic properties of Pearson’s Type III or 
Type VII, as the case may be. | | 

For this reason, and especially because of a common pattern to which Pearson's Types 
conform, it will be convenient to deal with significance tests in a sequence of three chapters 
(14-16) rather than to set forth the rationale of a test appropriate to a particular statistical pro- 
cedure in the same context as the exposition of the terms of reference of the latter. All the new 
tests we shall meet do in fact come within the compass of Pearson's system of which we had a 
preview in Chapter 6, being referable to Types I-III, VI and VII. If we recognise this at the 
outset, it will be possible to understand the relevance of each to the statistical procedure which 
invokes it without recourse to considerations derived from the geometry of hyperspace, or 


the use of matrix algebra. The standpoint of the author is that a statistical test based on 
I * 
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_ the assumption of a continuous score distribution is merely a device based on curve-fitting, 
and as such is at best an approximate description of the real world. From the standpoint 
of the student who is not a trained mathematician it is therefore fortunate that Pearson’s system 
of curve-fitting by moments anticipates so many of the requirements of subsequent theory. 
Hence the chapters dealing with significance tests start with an elementary exposition of the 
properties of moments and a review of Pearson’s type system in so far as it is pertinent to the 
end in view. | 

As stated, the several statistical procedures dealt with below have in common that they call 
for methods more elaborate than those employed in classifying and summarising quantitative 
data relevant to the issues dealt with in Vol. I; and it is easy for the beginner to confuse two 
different aims which may converge in the exposition of the algebraic rationale of any one of them. 
One is to specify certain relations—which we may speak of as tautologies—between numbers set 
out in a particular framework of classification, their truth having as such no necessary connexion 
with the theory of probability. The other is to construct summarising indices which have 
properties consonant with the requirements of sampling theory. The two aims overlap. For 
the interpretation of tautologies suggestive of an index whose sampling distribution is specifiable 
—or approximately specifiable—is essential to an understanding of the use of a statistical method, 
if only because its use depends on what information it summarises. 

At this stage, the last remarks may not be clear to the reader as yet unaware that it is rarely 
profitable to read a book by starting at the beginning and continuing to the end. The author 
can merely hope that some will return to our last words after a first quick perusal. Here it must 
suffice to say that there is common ground in the task of summarising data for very different 
statistical techniques which employ several criteria of classification, and hence of making the 
logical assumptions inherent in their credentials. Consequently, we shall start (in this chapter) 
with the exposition of notations which have no other justification than to reduce the effort 
involved in recognising some purely formal relations between numerical data when assembled in 
a particular way. Against this background, we shall examine (Chapters 12-13) the rationale 
of two statistical procedures without reference to what tests we appropriately invoke when 
applying them. 7 


11.01 THREE TYPES OF GRID 


Ir may first be helpful if we clarify two arbitrary levels of classification, which we may distin- 
guish as uni-dimensional and multi-dimensional. In the first category we include : (a) classifying 
a population (universe or sample) by stating how many or what proportion of individuals belong 
to a particular class distinguished by some attribute specifiable in either explicitly quantitative 
terms (e.g. tall, meaning 5 ft. 7 in. or over, or anaemic, meaning with an erythrocyte count of less 
than three million per cu. mm. of blood) or qualitative terms (e.g. yellow, Protestant, naturalised 
American) ; (b) classifying a population on some uniform scale as by stating how many individuals 
(at a given time) have a body temperature of such and such by intervals of one-tenth of a degree 
Fahrenheit, or an income of such and such by intervals of £50 per annum. In contra-distinction 
to the above the simplest sort of multi-dimensional classification (i.e. a 2-dimensional) arises 
when 


(i) we can assign to every individual of a population two scores (e.g. height and weight, 
or earned and unearned income) ; 


(11) we can state how many members of each of one exclusive set of sub-populations (e.g. 
Protestant, Catholic, Greek Orthodox, Other) are assignable to another of a second 
exclusive set (e.g. American-born, Naturalised, Other) ; 
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(iii) we can assign to every individual (or group) a score value (e.g. blood calcium level or 
milk yield) on a 1-dimensional scale, and can assign each individual (or group) to one 
of two or more exclusive sets of sub-populations (e.g. urban and rural, tuberculous 
and healthy). 


The questions which prompt us to classify data in one or other way referred to in the 
preceding paragraph are various ; but our method of assembling our data depends as much on 
the nature of the data as on the nature of the question. Inherent in any method we adopt are 
certain relations implicit in the method itself; and our preliminary task in this chapter will be 
to examine the problem of summarising data to bring into focus such relations aside from any 
utility they may prove to have from the standpoint of statistical theory. Corresponding to 
each of the 3 types of 2-dimensional classification we may thus prescribe as the first step in 
the summarisation of our numerical data a particular method of tabulation, i.e. a grid-wise lay-out. 

For case (i) which we speak of as the bivariate population, it takes the form shown below as 
a score-frequency (more briefly, frequency) grid. The explicit (7,;) in each grid cell is the 
number or proportion of individuals with a unique combination of A-scores and B-scores as 
indicated by the entries at the head of each column and row. To each cell we can therefore 
assign a score function of an A-score (a;) alone (e.g. az), of a B-score (b;) alone (e.g. 57) or of both 
(e.g. a,b;). The reader of Volume I has made the acquaintance of this lay-out in Chapters 8-9. 


Border scores Ay ay As as 
by Noo Mio Ngo Ngo 
by nor 131 Maz 131 
ba Moa Mi2 Maa N39 


The important peculiarity of the frequency (or correlation) grid is that each dimension 
carries a set of border-scores which collectively specify the criterion of sub-classification (e.g. 
height) in that dimension, and the notation makes sufficiently explicit the corresponding frequency 
of the cell-score under consideration. For example, we may be interested in the distribution 
of score products of the form a?b?. If so, we can write down the frequency of the particular 
product a2b? as the value of ns shown in the table. 

The appropriate lay-out for data specified as case (ii) above is the contingency grid : 


Protestant Catholic Greek Other TOTAL 


American-born 
Naturalised 
Other 


TOTAL 


The totals set out here have a special importance, because it is implicit in the structure of 
a true contingency table that we can specify the number or proportion of individuals in each 
of the B-classes (here national status) assignable to any one of the A-classes (here religious faith). 
Thus the column totals (1V,.), row totals (N, ;) and the grand total (N) are all fixed. It follows from 
this that we can fill in any cell of a row or any cell of a column, by deduction of the residual total, 
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if we know the entries for all the other cells of the same row or column. Ina grid of r rows and 
c columns (i.e. rc cells excluding the column and row totals), there are thus c redundant row 
entries and 7 redundant column entries, and in all r + c — 1 redundant entries, since the last 
cell of the last row and the last column is common to both. Hence the number of cell entries 
we require to know before we can complete the table is re — (r + c — 1) = (r — 1)(e — 1). 
This restriction distinguishes what we here call a true contingency table from a lay-out 
which is superficially like it. Following Mendel one may classify pea plants exclusively by seed- 
coat as yellow or green and exclusively by stature as tall or short. All we may happen to know 
about the possible structure of a sample of N plants we may then set out in one dimension as 


Short. 


Yellow. 


d = (N-a-b-c) 


Alternatively, our lay-out may be 


Tall Short 


Green 


Yellow d = (N-a-b-c) 


Total 


da 


A spurious contingency table of this sort summarises the possibilities rather than the actualities 
of an N-fold population structure; and the reader will note that we require to know rc — 1 
cell entries before we can complete it in the absence of the additional information our row and 
column totals of a true contingency table supply. 

The two kinds of tabulation schematised in the foregoing remarks have this in common 
that the explicit cell entries are absolute or relative frequencies, i.e. proportions or numbers of 
individuals. For data specified as case (111) above, the appropriate lay-out is a grid of which 
the cell entries are scores. If we have only one qualitative criterion of classification the score- 
grid is merely a set of scores laid out in any order within columns referable to all-or-none 
classes ; we can set out data classifiable w.r.t. more than one qualitative criterion as below. 
Here the cell entries (x;,,) are scores referable to individual members of the population or 
groups of individuals specified as members of one or other class of 2 different sets as indicated 
by labelling the rows and columns. ‘Thus we may be able exclusively and simultaneously to 
assign the fertility rates (x,;) of- groups which share the same religious faith and groups 
distinguished by national status, as below : 


Protestant Catholic Other 


American-born 
Naturalised 


Other 


~a 
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The student will note that we have added no entries for score totals at the foot of the columns 
or margin of the rows. Nor would the addition of such information dispense with the need to 
make every cell entry explicit, unless we also knew how many individuals each sub-population 
contains. In what follows we shall explore relations between numerical characteristics of sub- 
populations classified in 2 or more dimensions with special reference to the correlation grid 


(11.02-11.04) and the score-grid (11.05). 


Relation Between Score-Grid and Correlation Grid 


As pointed out (p. 408) in Chapter 10 and illustrated in Fig. 83 of Vol. I, it is not possible 
to convert a score-grid exhibiting two different criteria of classification (one referable to columns, 
the other to rows) into a 2-dimensional frequency grid ; but it is always possible to summarise 
the data exhibited in a correlation grid by recourse to the alternative device of a score-grid with 
one explicit criterion (A, not-4) of classification indicated at the margin of the columns or rows. 
Such a score-grid of two arrays is indeed the lay-out for computation of the product-moment 
index as in Chapter S (pp. 354-355). We may in fact summarise the distribution of sixteen 
paired (x,, X») scores in three ways as in the numerical example below : 


(1) Frequency Grid 


(xa) 
Border-scores 0 E 2 3 
0 
1 
(x) 
2 
3 
(11) Bivariate Score Distribution 

Mags Bi : 050 0.1 10 1.1 1,2 uk 2.2 2.3 3.2 3.3 
Rel. Freq... =d 1 1 3 Z ps 3 1 1 1 


(111) Score-Grid of 2 rows and 16 columns (one explicit criterion) 


a | eS | | SE | | | A A o | SS 


Subscript Notation 


Many of the problems of manifold classification are simple, or at least amenable to simple treatment 
from an algebraic viewpoint. The difficulties of the beginner arise especially from the difficulty of 
recognising the precise meaning of the symbols. That is why we shall here use a notation which is at 
first sight cumbersome and to the student unaccustomed to subscript notation a little formidable. 
Fortunately, familiarity will breed contempt for the disinclination to get used to it. In fact, it is much 
easier to apply elementary mathematics, if one makes the meaning of the symbols as explicit as possible. 
For instance, bo», boa» bop for British officers at home, abroad on service and abroad as prisoners of war 
are less confusing to work with than the x, y, z of the school books. 
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1102 ALGORITHMS OF THE SCORE-FREQUENCY GRID 


When we say that the correlation or frequency grid is the appropriate summarising lay-out 
for a bivariate population, we use population and individual respectively for an assemblage and 
its constituents whether human or otherwise, inanimate or living. ‘The population, as in the 
model situations of Chapter 8 in Vol. I, may consist of individual human beings to each of which 
we can assign examination marks for two different subjects (A and B) or, as in the models of 
Chapter 9, a sequence of individual games each of which is specifiable in terms of the score of 
each of two players (A and B). For purposes of computation it may be convenient to use whole 
numbers as our explicit cell entries, though in fact the most suitable schema for computation 
by machine work will not take the grid form. For analysis of the numerical rules inherent in 
any such lay-out, it is more convenient to use frequencies (y;;) as defined in Chapter 3, p. 101, 
being the ratio of the cell number (,,;) to the grand total (N), i.e. yi; = (nı; + N). We may 
then set out the grid as below : 


Border scores Ay ay Az ee ee ARAS TOTALS 
by 
bı 
bz 


b; 


TOTALS 


This schema embraces more than the structure of a bivariate population as defined above. 
With appropriate interpretation of y,, it describes the frequency distribution of any function of 
the scores of two samples drawn independently from the same or identical universes. ‘Thus 
the cell-score corresponding to y;; is (a; + by) if our concern is with the distribution of the score- 
sum and (a; — bj) if our concern is with the distribution of the raw-score difference. In any case, 
the column totals (y,. etc.) and the row totals (y. ¿ etc.) respectively specify the frequencies of the 
border A-scores and B-scores, in this case the two sets of sample scores; and the product rule 
of independent assortment is equivalent to the substitution in each cell of the identity 


Ved ES E > R 3 x z (i) 
In any case, for a grid of c columns and r rows labelled respectively from 0 to (c — 1) and 0 to 
(r — 1), in conformity with the convention for finite difference series, 
(e — A (r — 1) 


E 2999. ; . ; : ES 


(r—1) (c-1) \ > =) ie ES 


ys =1= 2 pa ae : - (iv) 


j=0 1=0 i=0 j=0 
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If x;; is the cell-score of the zth column and the jth row, we may denote the mean value of 
the x-score for the ith column as M,.;, for the jth row as M,.,; and for the whole grid as M,. 
By definition of the mean, we then have 


1 ap 1) | ¢€=») 
Mo. = > My My — > Vis Xij . . 0) 
i. ¿=0 -314=0 
(e —1) (e— 1) (r—1) (r—1) (e — 1) (r —1) ; 
JE Yi. Meci = > pa Vis Xiy= Ma = > > Vig Xii = >. Ys Mos W) 
i=0 i=0 j=0 j=0 i=0 j=0 


Example 1.—From the crude data on the extreme left determine the row-means, column-means and 
grand means of x = a*b, 


A-scores (a;) Frequencies 


0 1 = TOTAL TOTAL 


2 
B-scores 3 


(b;) 


TOTAL TOTAL 


The column, means of the x-score are 


E (0:075 + 0:10 + 0:30 + 0-10) (0-4 + 1:6 + 1:5 + 2-0) 
3 0:25 < 0:55 
=05 2a T0: 
The row means are 
(0:075 + 0-4) (0:10 + 1:6) (0-30 + 1-5) _ (0:10 + 2:0) 
0:2 : 03 : 0:3 : 0:2 

| = 237953 563-6: 10-5, 
The grand mean is 

(0:075 + 0:4 + 0-10 + 1-6 + 0:30 + 1:5 + 0:10 + 2-0) = 6:075. 


* * * * * * * 
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The customary notation of double summation in (iv)-(vi) above is cumbersome; and we 
can greatly simplify such expressions by the following code: 


1 642 
— do) HE) o 
Yi . =0 
1 ¢=» 
— > wt.) A + ee 
Y.j¡¡i=0 
(=) 
a eG EE ee ce 
(r—1) 
29-4 )= EBC) oc. o o 
(e —1) (r—1) (r— 1) (¢— 1) 
2 2 yuh ..) =E=s 2 = A o rn 


Thus £E signifies the operation of extracting the grand mean of the cell-scores; E,., and 
E,.» respectively signify extracting the column (b-scores variable, a-scores fixed) or the row 
(a-scores variable, b-scores fixed) mean ; E, and E, respectively signify extracting the mean of 
the column mean or border scores and of the row mean or border scores. In this notation 
(v)-(vi) become 

Ma. i = Eya) ) Mo -3 = Es. »() ; 


EM.) = E,. By. do = Es) = E > Ed Mo). 


Whence we see that 
E. Er. de JH i . . (xii) 


The reader will note that the dot in the subscript is consistent with the customary notation 
of partial correlation introduced in Chapter 9 of Vol. I, viz. the letter after the dot indicates 
which dimension we hold constant during the operation. ‘The same convention makes it easier 
to write partial differentials with the typewriter. ‘Thus for u = f(x, y, 2) we may write 

2 
Dr > Due jo ar etc. 

We shall see that the new code will make it possible to derive very briefly certain summarising 
tautologies both of the correlation grid and of the independence grid, and we shall later (Chapter 
19) use it extensively in connexion with the non-replacement problem. In some situations our 
cell-score hitherto called x;; will be a function of both the a-score and the b-score ; but it may 
be the a-score or a simple function of the a-score alone, the b-score or a simple function of the 
b-score alone. In any case, we shall adopt the fixed convention that the a-scores are the column 
border scores, and we speak accordingly of summation in the A-dimension as summation from 
column to column, i.e. within a row if we hold the b-score constant as in the operation £,. ;. 
Similarly, summation in the B-dimension is from row to row, i.e. within a column if we hold 
the a-score constant as in the operation Es. a 

To use the code with minimum effort, the reader should familiarise himself or herself with 
certain rules some of which refer to operations involving functions of one set of border-scores 
only, some referable to functions of both. On that account, it will be convenient to label our 
scores accordingly. If we refer to the border-scores as x,, X» we may define 
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u; = x* throughout column ¿ when / is an integer . : : Guia) 
v; = xf throughout row j when k is an integer. : i : ; (ay) 

w;; = any single-valued function of both x% and xf in the cell (z, j), when h and k are integers 
or zero . : go ¿A 


For brevity we shall also use M, .;, Mu .; for row-means, M,. ;, Mw. for column-means, 
M,,, M,, M, for grand means of u,, v; w;;and M,, M, for the grand mean of x, and x». Some 
of the ensuing rules apply to all the E-operations in which case we shall use E, generically. 


Rule 1. Redundant Operations 
Since u; is constant throughout the column and v; throughout the row 
By. a(i) = ui; Eq. (vs) = Vi, 
.. E(u,;) = Ea Ey. {u;) = E(t); E(v;) EE CE E AOA ~ =. (on) 
Notice, however, that 


E,. (0) =M,., and E M,.)=Mo; E..(ui)=M,.; and EMi) = M, (xvi) 


Rule 2. Change of Origin and Scale 


From elementary arithmetical considerations we know that the effect of a scalar constant is 
multiplicative and that of an additive constant is additive in the process of extracting the mean 
value of a score function, 1.e. 
E (K . u; +C)=XK.Egqfu;s) + C o . (xviii) 
Since u, is a constant in the B-dimension and v; in the A-dimension of the grid, we have 
E(u,v;) = Es . E, A (4,05) = E[u; . E, = a(V;)| , 
E(u;) = E, ° E i p(t; ;) = E[u; . Te a 3(2;) |. 
We may write the above as 
Eu; $ M, 3 a) = E(u; . U;) — E(v; . M; q 5) . . . (x1x) 


Example 2.—Put y, = (3x, + 1) and z, = (2x, + 5) in the foregoing numerical illustration 
(Ex. 11.01) so that the column border-scores become 1, 4, 7, 10 and the row border-scores 5, 7, 9, 11. 
Now find the row-means, column-means and grand mean of y3, 22. Compute also the corresponding 
mean values x3 and x?, and check (xix) when u; = x2 and v; = %,. 


Rule 3. Sum or Difference of Means 
It follows from the last two rules that 
E, g(t; 705) = u; + By. a0) = u; + Mb.¿; 
E, (us +05) = Es. (us) +0, = M,.; +05, 
e lu, +0) = E, Ey (tu; +0) = Edu) EE Mo. 1), 
<e E(u; + v) = E(u;) + Elo) : i i ; i ; ARS) 
If s;; = (u; + 05), or dj; = (u; — v;) we may write this as 
M, =M, + Mo; Ma = M, — M,. 
In connexion with moments, it is important to recall the rule in this form since 
Si; — M; = (u; — Ma) + (v; — M) . i i croa) 
di; — Ma = (u; — Ma) — (v; — M) . i ; i ea 
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Example 3.—Test this rule by making a sum and difference table from the data of Example 1 
for w;; = (u; + v,;) when u; = x3 and v; = x? as below: 


Sum (a + x?) Frequencies ( x 40) Differences (x? — x?) 


Rule 4. Partition of Variance 


One of the most important summarising tautologies of the grid concerns the partition of variance 
w.r.t. either set of border-scores x, and x,. We shall write for the total variance 


V, = E(x, — Mo)? = E(x?) — M? and V,= E(x, — My? = E(x) — MÈ . (xxiii) 
For the variance within the row or within the column we have 
Vain =Lo. (Xq pes Mak =s E. kx) ay Mé.» 
and 
Fi =Er di Me AA M5 
The mean values of the above are 
M(Va.v) = E(xq) — Ed Mz.) and M(Vs. a) = E(x) — E (Mia) . (xxiv) 
For the variance of the row- and column-means we have 
VM...) = El Mo.» — M,) = E (M.o) T Ma 
and. 
V(My..) =E(M,..— M= EM) M . . . (am) 
By combining (xx111)(xxv) we have 
Vi = MUV, ) + V(M,.)) and BEMP) + VM) . (em) 


The reader should note that this is a necessary numerical property of any set of numbers 
laid out as a score-frequency grid ; and should test it by recourse to the data of Example 1. 


Rule 5. Moments of a Discrete Distribution 


With the aid of Rules 1-3 we may compress into a single expression several important properties 
of moments. As defined in Chapter 6 of Vol. I, the pth zero moment (up) of a score distribution 
is the mean value of the pth power of the scores and the pth mean moment (m,) is the mean value 
of the pth power of the score deviations from their mean value. For the column border-scores 
(Xa) we thus write 


dtp = Elx) and yn, = E(x, — MF = Ele. — ay? 


The properties we shall now exhibit are derivable from the mean value of a single expression 
involving both sets of border-scores. On the assumption that p is an integer, we define it as 


(a — C) linm E)? =F= (0, £x) (CAT? .  . (ami 
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We may expand this by the binomial theorem in two ways as 


Se Piola — Cfr — K) = F = qe IDA ta C A (awi) 


2=0 2=0 


We may therefore write 


S (E DPB lee Cm — EY] EP) Y (= Wibal EnC KP] (ei) 


Let us now put x, = 0 = Cand K = «m in (xxvii), the ambiguous sign being positive, so that 
EP) = Ela, — ain)” = Mo. 


We then have the rule for expanding the mean moment in terms of zero moments, vz. by re- 
course to the right hand expression in (xxix) : 


= (= 1) PE . XE oft = x (= By Piz). a), 


z=0 z=0 
2=p 
> Mp = 2 = 1) Pio) . ahi aho-  - . . ; : 4 (xxx) 
z=0 
For example (the left hand subscript being redundant), 
My = pa — 4p + Spy. a — Sp. 


If we put x, = 0 as before, but C = „u = — K in (xxvii), again interpreting the ambiguous 
sign as positive on the left of (xxix): 


E(F) = E(x3) = alto» 
2=D 2=0PD 
Mp = > PEs a ar)" "T Fa > Pisa “E(x, ee aba)”, 
2=0 2=0 


z=p 
MES Dial ttle a : í : ‘ i ; O) 
2=0 


For example, since m, = 0 and m, = 1 for any distribution 
pa = My + 4u Mz + Buz. Ma + på. 


no us now put C = ¿q and K = ¿uy retaining x, in (xxvii). From (xxi)-(xxii) above, if 
= (Xa + Xp) is the score-sum and xa = (xq — Xp) is the raw score difference, we have 


sha = ala + oy = C+ K and E(F) = E(x, — spy)? = Mp, 
af = of4 — 0a = C— K and E(F) = E(xa — ap y? = Mp. 


With appropriate interpretation of the ambiguous sign (positive for the sum, negative for the 
difference) the left of (xxix) gives us the mean moments of the score-sum and raw-score difference 
distributions as 


c= p 
ty = > DEl. — e (Hs — ou)?" : (aii) 
2=0 


z=p S 
sde E ZE 1D Bla — apa) (o — op) , . (xxiii) 
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We shall later see how to interpret these expressions if the two sets of border-scores are 
independent. Meanwhile, we can obtain at once the familiar variance formulae 


sm, = E(x, — M,)? + 2E(x, — M,)(x, — M,) + E(x, — M,)*; 
am, = E(x, — M.) — 2E(x, — M,)(x, — My) + E(x, — M,)?. 


The mean value of the product of the border-scoce distributions is, of course, the covariance of 
the distribution. ‘Thus the above are equivalent to what we have hitherto written as 


V(x, t2) = Vat Vy +2 Cov(x,, xp). 
Independence implies that Cov(x,, x,) = 0, in which case 
V(%q £x,)=V,+ Va. 


Expressions for the zero moments of the score-sum and raw-score difference distributions 
are obtainable from the left hand side of (xxix) if we put C = 0 = K, so that 


E(F) = E(xa + %p)” = sp OF alp. 


With appropriate use of the ambiguous sign we then have 


2=0p 

Me DEl AS : : ; ; . (xxxiv) 
2=0 
2=p 

Po = 2 A E . . - 
z=0 


Example 4.—Test (xxx) and (xxxi) by recourse to the data of Example 1. 


Rule 6. Score Products 


Let us now examine the implications of the following identity : 


E (xg — apa); — open) = Elo x5) — abtnE (xb) — opel (xh) + atten olik 


n E(xa — apro — op) = E(x. xb) — afin ohr - . 
A particular case is the now familiar covariance formula. Whenk=1=kR: 
E(x, — Mo, — My) = Cov (xq, xp) = E(x,.x%,)—M,M, . . (xxxvii) 


The more general expression defined by (xxxvi) will be of interest in defining criteria of indepen- 
dence. We may speak of the expression on the left as a covariance of order (h, k) and the first 
term on the right as a zero co-moment of order (h, k), and by analogy with the notation of moments* 
write this result briefly as 


Cov (x, x) = pan — Mn be (xxxviii) 


* It is arguable that the expression covariance of order (h, k) is more fittingly applicable to the mean co-moment : 
Bie, xp) oe E(x; nE alti) "(xy E pl)". 


However, we may interpret the alternative expression on the left of (xxxvi) as an ordinary covariance by the 
substitutions 
U = XG; KlU) = qu, and v=x5; Hal) = ohr 
We then have 
Cov (u, v) = Efu — p,(u)][v — pa(o)] 


begs E(x) == abtn) (x5 — plty). 
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Example 5.—Test (xxxvi) for h = 3 and k = 2 from the data of Example 1. 


The following identity will sometimes prove useful. If X, = (x, — M,) and X, = (x, — M,): 
i oe eee MX, and As = La... — Mo a, 
<. E(x, . Xa) = E(x, .x,) M, . E(x,) and Elx, . Xp) = Elx . 2) — M,. Elx), 
. o E(x, = a = E(x, . Xp) = M, . M, == Elx, . Xo), 
HA X,) = Elx, A) . ; ; i ; ; . (ix) 
We may likewise express the covariance in the following way 
Ela == Mx, = M,) == E Es -ae al Xs =r M,) == E A XQ E, E al q M,)|, 
SA E(x, — Mx, e M,) = Es . Xi Ms. oe M,). 
Similarly, 
E(x, = Mx, E M,) = E, . Bo be A CaaS Ma = Es . AN (XQ ered Mal, 
“. E(x, — Mao, — M,) = £,.X,(M.., — Ma). 
Whence we have 


E(x ES Mo) Ma. Te Ma) = Cov (Xas Xy) se EAE an MM. a 5 Ma) $ (xl) 


11.03 THE INDEPENDENCE GRID 


To say that two distributions such as those of the border-scores x,, x, of our grid in 11.02 are 
independent in the statistical sense of the term is to say that the cell frequencies are in accord with 
the principle of equipartition of opportunity for association, i.e. with the product rule specified 
by (1) of 11.02, viz. : 

Ba: eee O eee 
If we now go back to our code, we see that (vii) and (viii) of 11.02 then mean 


(r= 1) 
Ey... .) = 2 y. e. -) = Bl. - -); 


le — 1) 
ER A E Ze A E AN 
Tf, as before, u; = x4 and v; = x*, this means 
M,.;=M, = E,(v;) and M,.,; = M, = E(u). 
In the notation of moments, we write this as | 
b. atk = Ey . a(x5) = pig and gor = Eg. o(x7) —= aPn + . ea) 
In the same way, we interpret (xix) in 11.02 as follows : 
Edt; - M, . ;) = E(u; .v;) = Eso; . M,.;), 
Emy E) = Lays ©) =M, Ede), 
te Pe. V) = Mi Ma 


In the same notation of (xxxviii) this is equivalent to 


Pre = Hn + Pr . . . . . . (ii) 
Whence independence implies in virtue of (xxxvi) : 
A A a 


When t= F= & we write this as Cov (x,, x,) = 0. 
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Since mean moments are expressible in terms of zero moments by (xxx) of 11.02, it scarcely 
needs formal proof that independence also implies the following relation analogous to (i) : 


b . aMk = E, AE T M,y = Mk and. ohhh. E,. a BEF M,y = (Mp. 
Thus we may define the two following criteria of independence : 


(1) with respect to moments of any order those of the B-score distributions within each 
column are the same and those of the A-score distribution within each row are the 
same ; 

(ii) the covariance of any order (h, k) as defined by (iii) above is zero. 


The reader should distinguish between a covariance of order (h, k) as defined by (iii) and 
the mean value of the product E(x, — M.) (x, — M,)* which is, of course, equivalent when 
h=1=R. We shall need to interpret this by recourse to a more general property of indepen- 
dence than is implicit in (ii). We suppose that F, is any single-valued function of x, alone and 
F, any single-valued function of x, alone, so that F, is constant in the domain of the operation 
Ey. {. . .) and F, is constant in that of the operation E,.,(. . .). We may thus write for the 
mean value of the product : 

HF, Fo) = EF, By. df). 
In virtue of independence E,..,(F,) = E(F,) = E(F,) which is a constant of the B-score 
distribution, so that 


E(F, .F,) = EP EAP.) = E(FYE(F,) . (iv) 
If F, = (x, — Ma} and F, = (x, — M,)*, independence therefore implies that 
E(x — M,)"(x» — My" = ¿My . ym, 0 


A useful extension of moment notation arises in this connexion. [It is consistent with our usage 
to put 
Bix y = ema El MI: : : A) 


Whence independence implies 


e x, — My 
(3) = an ox and E = ¿Mp . M-p. 


b (x, — 1G) 
In more general terms, we may write as a consequence of independence 
F. .. 
nal") == HAL) AA y . (vii) 
b 


We are now in a position to interpret (xxxii)-(xxxv) in 11.02 when the two score distributions 
are independent. For the sum distribution of the score-sum x, = (x, + x;), we then write (xxxii) 
and (xxxiv) in the form 


2 = 


p 
Me p3 Pin Mi A ; ; . (viii) 
2=0 


Z2=p 
a = Z. P(x) az . opa . . . (ix) 
z=0 


Similarly for the distribution of the raw-score difference Xa = (Xa — Xp), we derive from (xxxiii) 
and (xxxv): 


2=P 
qn, = > (— 1hiz aMz . bM p—z . . ‘ . s (x) 
z=0 
Sl ; 
ea >. (— D'Dio opz ota > : : - (x) 


2=0 
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For example, we have 
sg = als + Sapte. ota + Sapa . oa + obs 3 
ates = alts — Safe. opa + Saba. oa — obs 
If we recall that m, = O for any distribution, we may likewise write 
¿Mz = ¿Mz + ¿Mz and amg = ¿Mz — Mz. 


We shall use the foregoing results extensively in Chapter 14. 
* * * ¥ * * * 


Before proceeding, the student may with profit perform the following exercises by recourse 
to the data of the following table in which the cell entries are whole numbers : 


Border-scores 0 : 1 2 3 


(i) Satisfy yourself that the cell entries obey the product rule. 
(ii) Find the third zero and fourth mean moments of the A-scores in each of the rows and 
the variance and fifth zero moment of the B-scores in each of the columns. 
(iii) Verify equation (iii) above for h = 1 = k, and for h = 3, k = 03. 
(iv) Make tables of the score-sum and the raw-score difference distributions and verify 
(vili)—(xi) for p = 4. 


* * * * * + * 


As a particular case of (i), we have in (xxv) of 11-02 
Mr .a = M, = oH and Ma. = M, = ahı 
Thus we have 


VM, d EM — MY =0= EM, MP VM . (x11) 
We now recall the definition of the correlation ratios defined in Chapter 9 of Vol. I, viz. : 
VM..») VM,. a) 
Nib = MM) A - and x= VMs.) 7 : 


Thus the correlation ratios of a bivariate distribution are both zero if the component distributions 
are statistically independent. The product-moment correlation coefficient is necessarily so by 
definition, since Cov (x,, x,) = 0. While this is however a necessary consequence of indepen- 
dence, it is important to appreciate that it is not a sufficient criterion of independence. ‘That 
is to say, a joint distribution may have zero covariance when the product rule does not apply. 
In 11.08 we shall look at a few model situations which illustrate this possibility. 


11.04 TAUTOLOGIES OF CORRELATION 


We are now in a position to set out certain summarising tautologies which we shall make use of 
in Chapters 12 and 20. We say that there is perfect correlation between two sets of scores if 
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there is a one to one correspondence between them. Each row and each column of the score- 
frequency grid must then contain only one entry. In our symbolism this implies that 


oor =0=> Le 
ee ee a ee ieee 
Whence by (xxiv)—(xxv) of 11.02 
F- = V(M, E b) and Vs = VM, ; R 


We have already defined two grid parameters in terms of the above known as the correlation 
ratios, viz. : | 


VMMa). g VO) 


2 


tee E ba A 
In accordance with our definition, perfect correlation therefore implies the identities 
Tay = -|- 1 => ee . . . . . . (1) 


If there is independence, as shown in (xii) of 11.3 


2 
Map = 9 = Tia. 
As a summarising index ^Na» has therefore the same essential properties as 7a». 

If the two sets of border-scores are not independent, it may happen that the column-means 
(mean values of the B-score associated with fixed values of the A-score) or that the row-means 
(mean values of the A-score associated with single values of the B-score) constitute an arithmetic 
-series increasing (or decreasing) by equal increments corresponding to equal increments of the 
appropriate border-score. We then say that there is linear regression of the B- on the A-score 
(column-means in arithmetic progression) or of the A- on the B-score (row-means in A.P.). If 
ky, and kas are scalar constants, called linear regression coefficients and Kra, Ky», constants refer- 
able to the origin we then have 

Bey sa = Roa E Koa and M: pi kar . b; a Ka 
ove M, = EM, : a) == Roa E.(a;) =p Ko za ky MV, T Koa, 
we Mya cE M, m kya; = M,) 
We shall henceforth write this equation definitive of linear regression of the B- on the A-score 
in the form 


Mo. — My = Royala — Ma) . (1) 
In the same way, we shall write the equation definitive of linear regression of the A-score on the 
B-score as 

Ma. o — Ma = Ran» — Ma) i ; : ; . (111) 
If there is linear regression of the B- on the A-score, we derive from (ii) in our new notation : 

Kay Moa = Reg dy Rya. Oe Re 
e. Blay: My.) = Rog» E(02) — Roa» Ma Ela) + My . Elx) 

=k,,.V,+M,.™M,. 

Whence by (xix) of 11.02 : 
Corta) Pa ; ; . (iv) 

Likewise, linear regression of the 4- on the B-score signifies that 


Contra) = Mi ; ; ; 00 
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We now recall the definition of the product-moment coefficient (Chapter 9 in Vol. I), viz. : 
Cov (xa, Xp) 
ab = 
VV,.V, 


Thus linear regression of the B-score on the A-score implies the relationship 


Fi 
Tar = Roa VA a 
and ee kai A ; ; ; ; . (vi) 
Oa 


Similarly, linear regression of the A-score on the B-score implies the relationship 
OE ee T), 


O . 
mE 3 ; ; Ge 

0% 
As a criterion of linear regression in one or other dimension of the grid, it is customary to test the 
correspondence of the product-moment formula of 7,, with the so-called correlation ratios 
which we have seen to have the same essential summarising properties as fa» itself. We shall 
now see that regression is necessarily linear in the appropriate dimension when 

ba — = Top a = as 
F, a a V, a 


If there is linear regression of the B-score on the A-score we may write in virtue of ( 11) above : 
M3. .= [Rulo — Ma) + My} 
= kalka — Nie hy, Maam Ma) + Mọ, 
-~ Mọ. a — Mẹ = Rial%a — Ma) +2. Rog. Mola) — 2. Rog. MaMo, 
. EME.) — MẸ = kẹ, E(x, — Ma) + 2 . Roa. Mo E(x) — 2 . Roa - MaMo, 
“o V(M; a) = Ria Va + 2. kya. MM, — 2 . Roa. MiMi, 
: V(Mb.. a) nz: Ya 
ae da > T7 


Hence from (vi) above : 


(xi) 


eo ae , ex LA) 
In the same way, we derive the necessary condition of linear regression w.r.t. the A- on the B- 
score, 1e. 
V(M.. +) V, = 
ns = Se += x, . Ya . . . . . (xiii) 
We may derive (xii) alternatively by a method which is adaptable to show that this condition is 
sufficient (p. 528) as well as necessary for linear regression. For brevity we shall write Cov (X4, x;) 
as Cay. By (11) and (iv) above, linear regression of the B- on the A-score signifies that 


as 
(Mo. a sith M,) oe y o aay” Ma) 
ta 
ee (Mo. . — Mi) — (x, — Ma) =0. 


V, 
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We shall now write for the expression on the left 


D = (Mo. a 


En 
Fla — Ma), 


Thus linear regression of the B- on the A-score signifies that D = 0, as must be true if both 
its mean value and its variance (Vz) are each zero. That the mean value of D is zero is evident, 
since 


EM. a — v) = EMi . a) — M, = M; — Mp, 


and since C,» and V, are constants and E, (x, — M.) = 0, the second term of the expression 
vanishes. 


Va = EdD’) — [EWD)P = ED’) 


= BAM. Mp So eS re = Ma} 


2C ap 


ae E (M,. ie = Mi = E E (Mo. a MAA orn Ma) F SEE (sa zen MIA 


In the above E,(x, — Ma) = Va, E Mo. a — M,)? = V(M,.,); and we may write as for (xl) 
in 11.02: 


EMs. a — Me, — Ma) = Esla. Mo. a) — Ma Es Ma. a) — Ma . E(x.) + MM, 
= El, 1) — MM == COOLS M6), 
F: 
When D = 0, so that Vz = 0, as must be true when there is linear regression of the B- on the 
A-score : 


$8 Va == V(M,..) ee: 


C? 
V(M =— 
( E Fo 
VU.) 00 
i re “tee 
"e. Nba = Tab 


The sufficient, as well as the necessary condition for linear regression of the A- on the B-score 
is deducible in the same way. In virtue of (i) linear regression in either dimension implies that 
for perfect correlation 


Nas = 1 = Ti 
Thus the linear regression in esther dimension of the grid guarantees that r,, should have its 
essential summarising property, namely limits of + 1. 


* * ¥ * * * * 


In practice it will rarely, if ever, happen that regression is strictly linear; and it will be our 
concern in Chapter 18 to examine situations in which the composition of a sample is consistent 
with the assumption that regression is linear in the parent bivariate universe, though the relation 
defined by (11) above is not exactly true of the particular set of data. We are still free to define 
a constant k,, in terms of sample covariance and sample variances in accordance with (iv) and 
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(vi), if we replace the observed mean B-score value (M,.,) for a particular A-score by a hypo- 
thetical regression score x,,, related to the corresponding A-score by a truly linear relation 
analogous to (11) above, 1.e. : 3 
| Xr.a— Mo = koala — MJ) . : ; i . (xiv) 
The mean of the regression scores x, . , like the mean of the within-column B-scores is identical 
with the grand mean, since 
E(x, . y eae M, gee Roa . Es %a E M,) = 0, 
al) Ma : ; : . (xv) 
By definition in accordance with (vi) above k,, in (xiv) has the meaning defined by 
_ Cov (xq, %y) Fi 


n E o Fag 


Ki Oa 


So defined x,. ais not necessarily an actual value of a sample B-score distribution for any 
particular value the A-score may assume. It is merely the value M,., would have if regression 
were exactly linear ; but certain necessary relations between the x,.. , score values and the B-score 
distribution exist in virtue of their relation to k,, as defined above and to M,. These do not 
depend on any statistical meaning we may attach to x,,, at a later stage. The reader may 
with profit take stock of them at this stage and return to what follows when we have occasion to 
make use of them in Chapter 18. 

Within the sample column, i.e. for a fixed value of A, 

Es AER = Xr.a 
ere A EA e wee Maa . Xr. ay 
A A SS eee eee ae i : A) 
By (xiv) we have 
EM, a Xp, a) ae E(Roa . Le T M,)M, -a 
7% Roa . Ea PRE MM, . a T M5 
= Rya- Ba pig. a) =v Rec iden Da =- MP? 
= k,, Cov (xq, Xi) + Mi, 
i HARE eg) = 1. Var De ; ; ; ; ; . (N) 
By definition also 
Pig, DA Or Vy > i ; . (xviii) 
_ We may now obtain expressions for mean square deviations of 
(i) B-scores from corresponding values of x,. y, 1.8. E(X».. — Xr.a):; 
(11) Mean B-scores from same, i.e. E(M,., — Xy. a) 


First, however, we recall that 
Els. a — My. a? = MV o.) = (1 — IV ; : A) 
We may write the mean square deviations of the B-scores from the hypothetical regression 
scores in the form 
EN Xp ghey) = Bl... — My — *,.,— My)? 
= Els... — My? Eo. — My)? — Elera — Milos .a — Mh). 


446 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


Whence, from (xviii) and by definition of V,, 

El%o.a — Xr. a) = FF Yas) — AEN a — Mo a) ae 
In this expression 

Els. a — Mi)(%-.a — Ma) = Eto... Xp. a) — My. Eley. 4) — Mo . Els.) + M; 
= E(xy.g- Xr. a) — Mọ. 
Whence, from (xvii) above, 
2E(x,.<¢ — Mir. a — My) = 277, . Vo 
Hence (xx) becomes 
E(x; a %,. "=A, í : ; . “ia 


In virtue of (xviii), we may therefore partition the total variance of the B-score distribution as 
follows 


ms : V, + (1 Sea a V == V, š s š : (xxii) 
oe E(x, a — My? + El%5. a — *r.0): = El%o.a — Mo)? “ : (xxiii) 


We shall now write the mean square deviations of the B-score means from the regression 
scores as 


EM Ne ig) Be My: = Els... — gee By a) 
e El. ca Mya Esc ee aes nia a IS E 
Whence, from (xix) and (xx1), 
EM. — Xr. a)? = (1 — n) Vo + (1 — r) Vo — 2É(%,.. — Mo. MAD. a — Xr. a): 
In the above 
Ely. M, . Mo. a— ra) = El. a) — ElXo a - 7.0) = Es. a. Mb. a) + EM. 
= Elo. a) — EMi a) = M(V 5 . a), 
we 2E(%,..— Mo. NA. a — Xr. a) = Al — Nga) Vo. 
Whence we may write 


E Mo... — %r.a)* = (1 — na) Va + (1 — 1), — Al — nia) Vs, 


ee E(M,. a= A = EA a rara" . . . . . e (xxiv) 
We may now make a tripartite division of the B-score variance, since 
(1 — Vo E Cia — Tonle En, - : . (xxv) 


Hd E w Mo. a)? eje EMo.a ia a Sn E(Secs ae M,)* = E(%o.a = M) (xxvi) 


For later use, it will be convenient to express this portion in terms of the score-grid pattern, 
by arranging our paired scores serially in c columns corresponding to c different values of the 
A-score. If the number of individual B-score values in the zth column is 7; and n is the total 
number of paired scores, 

ate | i=c 


eee 2 yao mit A =, ; ; . xvii) 
n j=1 


1=1 


* If regression is exactly linear (9% — 72,) = O and joa = Yap. Otherwise, mpa > Ya) since the expression on the 
left is necessarily positive, being the sum of square deviations from the regression score. Thus E(M,.«— *r. o)? 
is an index of departure from linear regression. 
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On this understanding, if b;.; denotes any B-score value in the ith column: 


1*=2 j=T1; 


Els) .¢— Ma a)? == >, 2 (b;. iM e = Som 


n;=1 
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EM,..—%... =7 5 r (My. i Xr. = oes 


E(x, a — My = E S rey, 2 
West 
We then have 
i=cj="; 
MI, =Sin= > > rá: Mi... (avi) 
i=1 j=1 
E = MR dt o o o a) 
1 == 
ek ce a E) 
¿=1 


There still remain two important tautologies of a correlation grid for future reference. Let 
us denote by fm the correlation coefficient between the B-score distribution and the B-score 
column means. This merely signifies that we replace the border A-scores by corresponding 
values of M, .a as written at the foot in most of the grids of Chapter 9 in Vol. I. ‘To say that 
regression is exactly linear is to say that the substitution merely involves a change of scale and/or 
origin ; but we have seen in Chapter 8 (p. 353) that change of scale and origin of either set of 
scores does not affect the value of 7. ‘Thus linear regression implies the identity 


E Vass 
This is implicit in other identities cited above. By definition 
Cov (M; . a) Xp) 
IM) Vs 
In the above, linear regression signifies that 
Cov (Mr . a x+) = E(M, . a — My) Xo = koa . E(X. . Xo) 
== Roi Come. a 


om 


Whence, from (iv) above, 
Coo OV ae = BV 


Also from (xi) above, when regression is linear, 


VV(M,. IV, = VR. Va Vo = Roa a0 


Thus we have 


Oo . 
Yom = Roa = Y ade . . 5 a . (xxxi) 
b 


The reader will later find that this result is important in connexion with the definition of 
the multiple regression coefficient (Chapter 18). 
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EXERCISE 11.04 


The student may check numerically any of the relations of this section by recourse to the models cited 
as Examples and Exercises in Chapter 9 of Vol. I or by reference to the Model of Fig. 90. 

The reader will also find ample opportunities for testing the tautologies involving relations between 
B-scores, mean B-scores for a fixed A-score, and hypothetical regression scores at the end of this section 
by interchanging members of any number of consecutive pairs of columns in the Model of Fig. 93. This 
will have the effect of displacing individual mean column values from what we shall later define as the 
line of best fit through the whole assemblage of mean column scores. 


1105 TAUTOLOGIES OF THE SCORE-GRID 


In Chapter 10 of Vol. I we have explored what assumptions we make about the structure of a 
universe, when we attempt to make a balance sheet with respect to sources of variation from the 
information a sample supplies ; but we did not attempt to specify how we estimate the compon- 
ents of variation from sample data from the universe of our Handicap Score-grid Model. When 
we later seek a rationale for the statistical procedures commonly subsumed by the expression 
analysis of variance, it will be helpful to be clear about what characteristics of a score-grid are 
tautologies of any such lay-out of numbers, regardless of considerations relevant to statistical 
principles or of what conclusions are our preoccupation when concerned with sources of 
variation. 

The notation employed in the preceding sections refers only to the type of grid which exhibits 
cell frequencies referable to cell-scores themselves functionally related to either or both of two 
sets of border-scores. When discussing the type elsewhere spoken of as a score-grid, it is neces- 
sary to modify our symbolism. In such a 2-dimensional lay-out (p. 429) of c columns and r rows, 
each cell entry is a single score with gross frequency (cr)-*. Its frequency within a column is 
y-1, and within a row c~t. If x,; is the cell-score of column z and row j, we may define below 


TABLE 1 
Mean Variance | 
¿SEN eo fi pod 
Whole grid =22 2 0 22 2 Cur UE 
i=1 j=1 t=1j=1 
ME dowd etd 
Within-row (jth) scores M;=- Xij ~ (x; — Mj)? = V; 
Cel ‘fei 
eas ag 
Within-column (ith) scores . M;=- Xij - > (x; — MY? = V; 
e Pe 
LS 115 
ee à a Sad 7 a 
Row means M =- Ž. M; : a (M; — M} = V(M,) 
j=1 j= 
oo pune 
Column means . ‘ M = p M; - > (M, — MY = V(M.) 
i=1 i=1 


THE ALGORITHMS OF SUMMARISATION 449 


TABLE 2 
Mean Variance 
Whole grid ; 3 ; E, . Ex,) = M EE dts = MP =V=E, Ee.) — M* 
Within-row scores a : Ey.) = M, Elx. — My? = V, = Edo?) — M? 
Within-column scores . > Ex, ) = M, EKn — MJ = V, = Efe.) — ME 
Row means : s E EM) = M EM, — MY = V(M,) = E,(M?) — M? 
Column means . À ‘ E(M,) = M EM. — MY = V(M) = E.(M?) — M? 


certain summarising indices of the grid as in Table 1. Three operative symbols suffice to ex- 
press these relations compactly, viz. : 


y oer ae 6 ie fr 
ae al res ar >), 
El _.)=E, and DA J= E, 
ee AA 


Under the sign of the preceding operators, we may label x;; indifferently as Xer or x,, with- 
out confusion, and parameters accordingly as in Table 2. It is important to recognise that the 
operation E, signifies extracting the mean value of cell-scores or functions thereof from row to 
row. Hence E,(x,,) signifies taking the mean of the r cell-scores within a column. Similarly, 
E, signifies extracting the mean value of cell-scores or functions thereof from column to column. 
Thus E.(x?,) signifies taking the mean of the c square scores within a row. Table 2 shows the 
same entries as Table 1 in the new notation. Without ambiguity we may write 


E(V,)=M(V,) and E,(V,) = M(V.). 


If up, is any cell-score function (e.g. x2) P, and P, being row parameters and column para- 
meters respectively, the following rule holds good for constants definitive of scale and origin : 


EAA uü, +k) =A.Eu,) +k and Ef(A.u,+k)=A.E(u,-) +k; 
EAP +R)=A.EXP)+R and E(A.P, +k) =A. EAP.) + E. 


Since M, and V, are constants w.r.t. rows within a column as are M, and V, w.r.t. columns 
within a row, 


EAM) = M, and Eo = V;; 
Biot) M, wd BAY) EV 
The following identities implicit in our definitions will be useful in what follows: 
Pi, a) EM, E Ax Ml = EM : 000 
E(M,.x) =E,[M,.Edx)] =E1MD ... . . (iii) 
Ma ME, Ela) = MP, ; ; . (iv) 
poe) BAM. ELMO] = EAM,..M) = M* - i a e 
MINE EMMA o aD 
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In this notation the fundamental tautology of the grid takes the following form : 
Y= E(xj;) — M?; 
V(M,) = EM) — MP; 
V, = E(x) — Mo; 
M(V,) = E, . Edxij) — E,(M;) = E(x) — ELM), 
<. MV,) + ViM,) = Elx) — M? = V. 
Similarly we derive 
M(V,.) + V(M.) = FV. 
Whence we have 
MV >+ WM) =V=M(V)+V(M) . ; s . (vi) 
In the Analysis of Variance we shall have recourse to a grid parameter which we can define 
alternatively in virtue of (vii) as 
V — V(M.) — V(M,) = V: = MV) + MV)-—-V . ; . (viii) 
Whence it follows that we can also define it in terms of square deviations of scores and means by 
ro.V,=7cV — rcV(M,) — rc. V(M,) i F F . (ix) 


For purposes of computation it is useful to set out a score-grid in duplicate for scores and square 
scores as in the numerical example below. All the requisite data are then in the marginal totals 
and their squares. For the grand total of square scores, we shall write S,, so that 


i=cj=r 


Sa == > > Xi; . . . . è . (x) 
i=1j=1 
The following code defines other requisite quantities : 
Score Totals Sums of Squares 
i=e j=r 
Row — T= > 4,60, s,=-5 T; . . . . (x1) 
i=1 j=1 
ae ¡ik 
Column ‘a > Xu EM de > ‘ogee ] : . (xu) 
j=l T ¡=1 
i=cj=r 
Whole Grd . T= wage, M Sai? e 
i=1j=1 
By definition 
uy we oe | 
: PSA 3 ae vom t PRES ; 
1 
<. VWM) =— S, — is S. 
rc rc 
Similarly, we derive 
1 1 
V(M.) =— S. ——S. 
rc rc 


Likewise by definition 
poe — VM? = : So eta 
rc rc 


YC 
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Whence from (viii) 


= =—S+—5, cas == $, 
We may write these results in the form: 
(eV. = agro. ; l ; : i . (xiva) 
rc. V(M,)=S,—S.. ; . (x1vb) 
w FM) =S : ; ; > : . (xtvc) 
r.V,=S,+S—S.—S, . (xivd) 


The last expression is obtainable directly from the foregoing by recourse to (ix). In the follow- 
ing example r= 2 and ¢=3: 


Scores. 


ES | |S | | 


| | | | 


| | | 


From the foregoing code 
(18*) == 54; 3, = 63; 
(180) = 0078, = 4122) = 61. 
Whence we have 
V = 4(68 — 54) =$ 
V(M,) = $(60 — 54) = 1 
V(M,) = ¿(61 — 54) = ¢ 
As a check we can of course lay-out the computation from first principles but with less economy of effort, 
as below 


Xij xi; M; (xi; — My M: (xi; — M) 
1 1 2 1 2-0 1 
4 16 2 4 4-5 1 
1 1 2 1 2-5 3 
3 9 4 1 2-0 1 
5 25 4 1 4-5 } 
4 16 4 0 2-5 El 
Total 18 68 18 8 18 7 
Mean 3 za 3 5 3 z 
M V+ Me M MV) =V —V(M,) M MV.) = V — ViM,) 
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OPERATIONS IN A 3-DIMENSIONAL SCORE- GRID 


COLUMN 
i=3 


oS 


EEO = My n= E : 
i J= i= 


E.E.(x-M, = Vy = 


> = E a 
E¿En (AMP =V p= ch G 5 hij My 


Fic. 85. The 3-dimensional Score-Grid. 


The foregoing relations refer to a 2-dimensional grid admitting 2 criteria of classifying the 
constituent scores. Let us now accommodate 3 class specifications in a grid (Fig. 85) we can 
visualise by addition of a vertical dimension of layers we label ash = 1, 2,3...m. We shall 
denote our scores as x,;; accordingly. The total number of cells in the grid is then nre distributed 


as follows : 
Layers  rccells Row-Slabs nc cells Column-Slabs nr cells 


Pillars n cells Rows c cells Columns r cells 
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With this end in view we may retain the symbols E, and E, as within-layer operations in exactly 
the same sense as before, adding the symbol E,, with the following meaning : 


Accordingly we define E anew as the operation w.r.t. all cells of the grid as 
Lab 2, = Es En etc. 


in any order. 'To distinguish parameters of the whole grid, parameters of two dimensions 
(layer, row-slab, column-slab) and parameters of one dimension (pillar, row, slab) we may employ 
a convention already used in connexion with partial correlation (Chapter 9, Vol. I). By P, 
we signify a parameter of the whole grid (e.g. the total variance V,). By addition of n, r or c 
respectively after a dot we signify that the parameter refers to one of n layers, one of r row-slabs 
or one of c column-slabs. By addition of rc, nr or nc respectively we identify it accordingly with 
one of rc pillars, one of nr rows or one of nc columns. Table 3 gives the code, to which it is 
only necessary to add 


E, . EAF 50) = MV, See); EAS os r) za M(V ,) ; EV y; o) a MV... ade 

Corresponding to the two identities of the 2-dimensional grid specified by (vii) we may now 
formulate 12 w.r.t. the solid score-grid. Since each layer, row-slab or column-slab is itself a 
2-dimensional grid, the following six call for no comment, being implicit in (vii): 
Within layers : | 

ELY son) + ELM. en JE ea) = Vein = Bal sen) + BAM gta — Má y (xvi) 
Within row-slabs : 

En Vo. on) + En Mo. rn — Ma.) = Vo. r = EdV 2.10) + EdMo.ro — Mz...  . (xvii) 
Within column-slabs : 

Eara > “a mE E, M.,. me: M. o i ee EZ DAS ve) a EM. a a EEE M .. a (xviii) 

Other results follow from the possibility of rearranging the grid cells. We may put the 
nc cells of each row-slab end to end to make a 2-dimensional grid whose total variance is V,. 
This grid has still y rows of nc cells and nc columns of r cells, the row variance being V,, , and 
row mean M,.,. Accordingly, we have the row-slab identity : 

EM». aes My os EXV 5. A EER Va 
If we put the nr cells of a column in single file we can make a grid of c columns, with column 
variance V,.,, the mean being M,.,. Again the total variance is V,, whence the column-slab 
identity : 
EM... EA MI 5E EdF e. 4) — e 
If we lay out the rc cells of a layer as a single row, we may make a grid of n rows with row variance 
Ve. ny etc., whence the layer identity : 
Finally, we may derive the following relation between pillars : 
Es BAM co — M,)? = Ey. E AM) —2M, E, EM.) + M? 
= E, . ELM. re) — Me; 
EEV 3) = Es Esti Es ELM o, 
a dpe br — MARE LV enc = Mor) JC = Ve 
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Similarly we may derive and write without ambiguity, the two following identities : 


V(Ms . en) + MV 2. rn). 


("A — "Moa a Ag = 
EN — (PEI a T= A 


eM — O NAT = i OR i 
CTIA FAN NY CAS ee A 


AW — CADE = ee es T feaa 
COADA = CW — + PWY iat a | A O PA 


a — O MD"Z = CA E Oe 
CDA = CH — * “WT AA A NT oe 


ew — CEDE a = a aa = 
q ADA ERa a” PEUS A. CS "We a $ “a Yu. A as eee ¿VI dat Cr fx)" 
¿N — (a) a "a = ao" "yr — + * tx) = 


pa a "WDA me E _ su “MDHA g toT 4u * "A mE REA (= A 


EW — (°° Aya +A = oe WNS 
(P°* AA =. — +9 BAr AT a, a i a — (® 207 


SUDIJI JO SDUBLIBA U11102 SIIODS [99 JO soURTIIe A 


€ 4'"I4V,L 


(o"t ae A=) 
(20 °° a = "WT pls sJoUM 


bd pann 
Ca A =° "AW SPEE OD 


qes moy 


Ce “nh a = a =) 
CPE A =" ar kv 


i AN uunTo) 


e . tx)’ T E Ud. “1 MOY 


- 


(2 * og = *** ® eld 


SUB9JA] 


We may write the identities last derived without ambiguity as 


=r, 


VM.) + MV o. 1) 


VMs. n) + M(V o. 2) 


=o 
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V(M a. rc) + MV 5. 10) = Va 
FMa. nc) + ME. no) = Vo 
FMa. ar) + MV. ar) = Vo 
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_ (xxii) 
_ (xxiii) 


. (xxiv) 


In the Analysis of Variance for 3 criteria of classification, the sample parameters which we 


shall use are VPO) VM...) MV a.r) MV a nc), MV asar) and three 


others defined respectively as follows : 


VOM rrd ey PM, >) ña 44 PRA = Va 
= MW a...) + MV 5.0) — M(V s.r) — Vo 


VE. a V(M 2. n) = V(M.. +) = Vig 


For economy of computation we may employ the following schema in which 


S= 2 > >. Xij 


The other items of the code are: 


Cell or Marginal Totals Mean Sums of Squares 
h=n 40 69 ey 
y= PM, Sre == 2 T; 
h=1 Ni=1j=1 
j=r ¡Era e = 
A 2 
Tr = 2 te DM == > > Tr 
j=1 f p=1i=1 
i=c 1%4=nj= 
ae 2 
= > nis = C- Ma == = Tj 
i=l Ch=1j=1 
h=ni=cC 5 in 
= 2 
D= a > ty = ne. M, S,=—> T; 
h=14=Í1 nCc5=1 
h=ni=f y is 
Ze 2 
= > >. Xn = nr. M, S => I; 
hA=1j-1 MY ¿=1 
F pa 
T == nj == Fea My Sn = — E 
imi j=1 TC p= 
h=ni=cj=r 1 
T= > > tay = ner. M S = — T? 
h=1i=1j=1 mcr 


The reader can check as an exercise that 
ne Vg = Bg — 8 
ger. VM, .) = 8+ — 3 
ner. V(M,..)=S.— 8 
ge VM.) = S a 8 
== De So 


(xxv) 


.. (evi) 


. (xxvii) 


(xxviii) 


(xx1x.a) 


(xx1x.b) 


(xx1x.c) 


(xxix.d) 


(xxix.e) 


(xix) 


(xx1x. 2) 


(xxx.a) 
(xxx.b) 
(xxx.c) 
(xxx.d) 


(xxx.e) 
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ner. MP Sa — Sne 
nr. ME 8, Sa. 
ncr Ven = S + Sor — S,— So 
ncr Ve = S + Sor — Sn — Sy 
ner Ver = S + Sne — Sn — Se 
ner. V = Sı + 2S — S, — Se — Sn 


(xxx.f ) 
(xxx.g) 
(xxx,j) 
(xxx.k) 
(xxx./) 


The following simple numerical example, in which n = r = c = 3, illustrates the pro- 


cedure : 
(1) (ii) 
Scores 
ER es dase Within-cell sums of Squares 
| 

ji dS 359.0 2.4.3 | 3.19 | 12 29 14 

ee obs hoe Rs ORs See CNE 
jaz Se 513 | 3.6.3 | 14 35 54 
j=3 Lt. 6.71 | O. Se 4] 41 54 

| 

Sy 


(iii) cA 


Cell Totals 


Square Cell Totals (T;;) 


36 81 36 
| 36 81 144 
81 81 144 
Total 


From the above entries we have 
Sy = 294; Spe = $(720) = 240; S = +} (78?) = 2928; 
Se= 8, <= 4(2070) = HE 


Whence we obtain 
Ve = gy(204 — 2928) — 388; 


MV...) = 70291 — 240) = 2; 
V(M,.-) = V(M,z. +) = (2970 — 2028) = 44. 


306 


720 
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To compute V(M,,. ,) we must assume some definite order of the items in each cell of (i) corre- 
sponding to the 3rd criterion of classification, and assume that we have therein arranged them 
accordingly. We then have the following lay-out : 


(v) 
252.3 2.4.3 $34 2 
2364 TE FT. 3.6.3 
4.0.5 eee 5 
Layer 1 8 13 11 
Be a eee 
eo ls 


From (v) we derive 
S, = 2984; 5 


9 ) 
V(M x .n) eg vee 
To derive M(V o no), M(Ve.nr) and hence V,, or V,, we shall need to rearrange the items of (i) 
alternatively as below. 


(vi) (vii) 
ial ¡=2 ¡=3 hed h=2 3 


Square Cell Totals (Tr). 


Square Cell Totals (77;). 


49 49 49 
100 100 49 


64 


Total 734 


Whence we obtain 


27 . M(V ene) = 294 — (734) ; 27. M( Vo. nz) = 294 — 4(734), 
“oe M(V zno) =. E > M(V x .nr) = mee 


Var = T and V,,= 2 
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EXERCISE 11.05 


1. Check the dual relation defined by V(M,) + M(V,) = V = V(M.) + M(V,) with respect 
to the following sets of scores first by direct computation and then by the sum of squares schema, i.e. 


S¿—S=cr.V; S¿—S=cr.V(M¿); S, — S=cr.V(M,); 
cr. M(V,)=S,—S, and ‘cr. M(V,) = S, — S». 


2. For each of the above determine 
V, = V — V(M,) — V(M,) = M(V,) + M(V,) — V. 


3. By recourse to the E notation of this section show that 
E(«,, — M, — M, + MY = V, 
Check this result by direct computation w.r.t. the foregoing numerical examples. 


Hint.—Put (%,, — M, — M, + MY = [(*re — M,) — (M, — MYP. 


4. Determine the parameters V,, V(M,.,), V(Me.c), V(Me.n) MVa.re), MVe.ne) and 
M(V e. rn) for the following set-up by direct computation in accordance with definition and by 
recourse to the sum of squares schema: 
S,—S=ncr.V; S,—S=ncr.V(M,); S, — S=ner.V(M,); Sn — S = ner. V( Ma) 


Sy — Sep = ner . MV or) So — Sno = ner . Ma): Sg — Sap = ner . M(V ny) 


q 
22 4.1 6.3 tel 
1.3 5.4 1 0.2 
4.2 2.9 4.5 2.3 


5. By recourse to the E notation show that 
E(M,, — M, — M, + MY = Vx». 
Hint.—Remember that the parameter of any particular dimension of a grid is a constant w.r.t. an 
E operator in any other dimension, so that 
ENVo.)=Vo.o=EnV 0.0); EKV a.r) = Va. r = EnV o. 1); 
EV o.n) = Van = EdV u.n); En Vo. re) = Vo. re 


6. Use the data of Example 4 to determine V,,, and V,, defined by analogy with V ,,,, i.e. 
Vic = M(V,) + MV) — MV nr) — Y 5 
Ver = MV.) + M(V,) — MV ne) — V. 
Show that the results are numerically consistent with the alternative definitions : 
Vie = E(M,, — M, — M, + MY; 
Ver = E(Mne — Me — M, + MP. 
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11.06 ADDITION OF COVARIANCE 


It is possible to extend the use of the symbolism of 11.02 to more than 2 dimensions. Thus 
our concern may be with a function F,,, of 3 score-sets Xa, Xp, Xe, IN which case we may write 


De okt cis) — mean value of F,,, for all values of c when both a and b remain 
constant. 

Es E. at — mean value of F,,,, for all values of b and c when a remains constant. 

E. Ev... Ec. a(Faw) = mean value of Fos, for all values of a, b and c. 

In this symbolism the operation of extracting the grand mean is 
A O O eg Begs A O 5 (i) 
A AA Ey. Be. Erle + +) ‘ (11) 
AA O A O A A +). 50) 


We may also need to combine the symbolic conventions of 11.02 and 11.05 to cover a case 
of special interest, e.g. when we have c sets of paired scores x, and x,, for each set of which we 
can assign a covariance. We may then denote: 

(i) by E,,.- the operation of extracting the mean of a function of both sets of border- 
scores in one and the same set ; 

(ii) by E, the operation of extracting the mean of any parameter (e.g. Va.e or Vy.) of a 

set. 


We should then write 
ESB Ea. . 
In this case, the criteria of classification in the A- and B-dimensions of the 3-dimensional 
grid are quantitative, always being defined by the border-scores ; but the criterion of classification 
in the C-dimension is qualitative. Fora case of special interest, we may write the covariance of 


the border-scores in a single set as 


Eno. (Xa = Ma. «Ax, i My.) == C00 EA Lo 4) 
: AA Mi ye . (iv) 


Its mean value for all sets will then be 


E ae) = M . Cov (eas) = Elx- %) — Ed Ma. - Meg O (v) 
For the covariance of the paired values of x,, % treated as a whole, we must write 
MS E, — Mi) = Coole, 7) = El. 3) Me. My sra 


Now there are c pairs of mean values M, . e, Me . c from which we may form 
E({(M,..— M, — My) = Coo (M, o, My 0) = E Ma. o- Moce) —Ma.My (vi) 
From (v)-(vii) we thus obtain a tautology of a 3-dimensional set-up reminiscent of (xxvi) in 
11.02, viz. : 
Cov (x, X) = Cov (Ma . es My..) + M . Cov (xq, X) -~ F . (viii) 
We can express this in a form involving regression coefficients in virtue of (viti) and (x) in 
11.03, viz. for regression of the B-score on the A-score : 


y _ Cov (Xa, Xp) . k _ Cov (Ma. o, My .c) z _ Cov (Xa. o eo) 
ba V, ) ma V( AS 3 ’ ba-c E AS - 
E O. ERA ESO A : A s 
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If each of the c sets of paired values x,, x, is divisible into r sub-sets, we may proceed to 
obtain expressions comparable to (viii) by recourse to the symbolic convention : 


Late Pardo ARE AE Dee Beds ds iia) 3 5 (x) 


Equation (ix) holds good only if regression is linear ; but (viii) is true of any set of numbers 
which we can pair off in layers of a 3-dimensional grid. Whether the layers are of equal size, 
i.e. whether the number (r;) of pairs in the 1th layer is the same as in any other, is immaterial. 
For purposes of computation, we may suitably lay out our figures in 2 dimensions as below. 
The reader may test the relation involved by recourse to other collections of numbers arranged 
likewise. 


Block. U V W 
Xa.u Xb. u Xa. v Xo. v Xa. w Xb. w 
3 5 10 1Z 1 3 
2 2 8 9 3 6 
4 7 5 6 2 5 
2 4 3 3 1 2 
Totals 11 18 26 30 7 16 
Means 2:75 4-5 6:5 75 1-75 4:0 


The total of the 12 A-scores is (11 + 26 + 7) = 44; and that of the B-scores is (18 + 30 + 16) = 64. 
For the grand means we have therefore 


M,=42 and M, = 48. 


13-75 97°75 8:25 545 + 24 


The mean value of all cross products is 
55 + 231 +33 319 
12 Bio oo 


Whence the total Covariance is | 
319 11 16 3s 
Y 3.7. A 
The covariance of the means is 
545 11 16 . 227 


eee ee 


Of Fe A 
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The covariances of the individual blocks are 


10027 780 144, 1 112 20 
E |e 16 16 


The mean of the above is 
2M 22 ‘as 
48 = 35 . (xiii) 
Hence in accordance with (viii) we have 
227 279 506 
Cov (Ma, M) + M . Cov (Xa, xp) = 72 -}- Tn" Cov (x,, Xp) 
For rapid calculation it is preferable to work throughout with score-sums and sums of score 


products as follows. If r; is the number of pairs in the zth layer of c blocks, we define : 


j= j= 
Sa. i = E Way So. j = > Xr . . > - (xiv) 
$ 1 j=1 
i=e t=c 
oD Su. Sa > Ss. . . . . (xv) 
i=1 i=1 
t=C j="; 
Sad = > > Xaij . X dig . < . ‘ z : (xvi) 
i=1j=1 
1 i=c 
PSS Seg Re r; . . : 5 . (xvii) 
Y; i=1 
i=C 
=> P, i ; ; ; ; ; . (xvii) 
i=1 
In this symbolism : 
Da. Os 
n Cov (Xa, Xp) = Sar — : (xix) 
n.M . Cov (xq, Xz) = San — Sp ; i . : See 
ra Pages 
n Cov (M, M,) = Sp — E ; (xxi) 
In the foregoing example n = 12 and r; = 4 for all (3) values of 2, the remaining relevant items being 
198 780 112 
og: Pem i ene © > 
545 | 
Sa= 3 Sa = 3195 iS, = 445 S, = 64. 


One consequence of (v) above, of importance in connexion with the theory of regression, 
is sufficiently elementary to merit comment in advance. Let us suppose that the same set of 
A-scores (i.e. the same values of x,., in the same proportions) occur in each of the c sub-samples. 
We may then write M,., = M, for each set, whence 

BAM. Mid Mar Mos 
If the c sets of paired scores each have the same fixed set of A-values in the sense defined, it 
therefore follows that 
E, . Cov (Xq. 0 2r 0) = Ela . Xo) — Mo. My = Cod (Xas Xp) < (xxi) 
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11.07 SUMMATION BY FIGURATE SERIES 


In Vol. I (Chapter 1) we have seen that it is possible to obtain expressions for certain power 
series relevant to determination of zero moments by recourse to figurate number series. To 
derive results we shall later use in determining moments of a distribution, in particular sum- 
mation of products of reduced factorials, it is also convenient to make use of the properties of the 
family of figurate numbers (s = 3 in Fig. 86) to which the unit, natural numbers, triangular 
numbers and tetrahedral numbers belong. For the foregoing we may use the symbols °F,, 


Fic. 86. Figurate Numbers in 3 dimensions. 
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Em) °F, *F, severally to denote the term of rank n and in general °F, for that of the d-dimensional 
class generated from its predecessor of (d — 1) dimensions in accordance with the same additive 
law, so that for positive integral values of d and n: 


2795, 34, oe eee 


The general expression for the family is a reduced factorial, wiz. : 


d _ (n+d—1) 
E a 


If we extend these series into the domain of negative integers the law of summation is (ix) of 
1.02, viz. : i 


=n (n -H d ea Dia, . ° . “ (ii) 


ee aes fe a 
Hence if k > 2 and n > 
r=n—k-+1 
“E, = aoe T a 1)" x a ar 
r= — (k — 2) 
Since *+1F _, = 0 for all positive values of k 
r=n—k+1 (n + 1)@+D 
oF eR E = ——— . ; ` Eo 
E, AN k+1 (k m 1)! ( ) 
Now we may write for a sum of factorial powers of the integers 
L=N T=N ylk) z=n r=n—k+1 
ets AR A StF. 
z=1 aa Ri a=0 r=—k+2 


Whence from (iv) when k > 2 
| zan o Fet 
= A f : ; : + 4) 
When k = 1 or 2, this is evidently true, since 


IS + — y, LEDO 


z=1 z =1 2 : 
r=nN ct=0n r=n-—1 (3) 
Y 092 F, 22 Y 22.07, SE 
z=1 x=1 r=0 
Hence (v) is valid for all positive integral values of k < n. 
In the positive domain the following relationship is also of fundamental importance : 
of eS ae e ° > > . . (vi) 


In virtue of these identities we may now establish the following theorems relating to the sum of 
products of reduced factorials : 


c=n ee (n + 1)@*+) = 
, Sil +o 4 = (+1)! : ; ; i ; < (vii) 


(R+x— mtra Dan etr a (vii) 


rT 
z=0 
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We may write the last in the form 


2 Tlk + x — DO (m +r — x — 179% = (k+ m+ r-— 1)” 
xz =0 


We can derive these from a consideration of the figurate form of the binomial expansion 


dat a1 +e 4 EDS IAS 


31 E ete 
OF x9 + LF x! + 2Fyx? + 3Fyx3, etc 
7=0 
pe > Bx", 
r= 0 
r= 00 
Te a > E g 
r=0 
=(1—x)* (1 — x)” 
= (°F ,x° + Fx + Fo? + 3F yx? E) 
X OFX + FE nx + Fox? + "Fox? . . . etc.) 
Fe. OF x + OF, . Em + Fi F yat 


HOP es *Fn + Fr. Fm + Fr 4 En = ia 


(Es En + Fr Be + Pe Fn + Er OP nl. ee 
r=0 =r 
== 2 > E N 


r=0 x=0 


r= r=0%= 
Te E "Fy m-* BS. d — E = Tr y PA 
r= r=0 =0 
Hence by equating coefficients we obtain 


z=? 
T =f aes 
x=0 


This is equivalent to (viii) and (ix) above 


t =r 


pa (x) 
If we put k = n = m it becomes 
yz (nm + — Dealt EA sa le + > — De 


Now we may write the expression on the left of (vii) in the form 


n c=02N : 
va Cain — C)a a = 2 "Fe el ITA 
Since c* 


= (0 unless c > x, we may set the lower limit of summation at c = x, and since 
(n — So = 0 unless (n — c) > (s — x) we may set the upper limit of summation at 
c = (n — s + 2), Le. 

c= 


n c=(n—8+2) c=(n—8+2) 
> Ci (2 = c)s => Cay (A de ua = Mei 
c=0 c=x 


En 
c=x 


(1x) 
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We now put z = (c — x) 
z2=(n—s) 


CEN 
> Cat — Cha-a) = >. eg. Bea 4g +1 
c=0 2=0 


In virtue of (vi) and of (x) above, the expression on the right is equivalent to 


z = (n — 8) 
2 n—8-2 — n-s — §+1 

Pea: s—x“2+17 Fe42= Pirri 
z=0 : 


In the notation of reduced factorials 


s+1F a elas 
n-—s+1l (n ra 1)! 
Hence in accordance with (vii) 
c=n (n+ + Š 
— ne E A a ee 1 s . ° . 
2 wln Chis x) (s en 1)! (n a ) u +1) (xil) 
We shall later make use of an extension of (xii) which we can derive by recourse to the identity 
(c Ea Neca) 
> hen a RE A 
( E3 X +1) (x te 1) 
“ola + (e + Der = € . Cea + Cea ; | y » (xin) 
In virtue of (xiii) we may thus write 
>. CC (A — Cha) = (x + 1) >. (c+ Listo — €)s—2) — >. Cal — Chisa) (xiv) 
c=0 c=0 c=0 


To reduce the first term on the right to the same form as (xii), we put y = (x + 1), 
z = (s +1), m = (n + 1) and u = (c + 1) so that u = 1 when c = Q and u = m when c = n, 
whence | 


c=n u=m 
p3 (c T Dis+y(a SS e => =. Uy (m ee Uja- 
c=0 u=1 


Since iy, = 0 when x =0 


CR 


> (c+ Vasu — Chis = E Uy[M — U)g—y = (M+ Lar 
u=0 


ec) 


CS 


> E Dier —odis=t+ pr 


C= 


Hence in virtue of (xii) above, (xiv) becomes 


C=? 


> €-Ca(2 — Chia = (x + Dn + 2)o4.2) — (n+ 14w . (xv) 
c=0 


11.08 THE GENERATING FUNCTION AS A GRID OPERATION 


To the beginner the generating functions touched on in Chapter 6, Vol. I somewhat savour of 
being wise after the event. ‘They cease to have an air of mystery when we recognise them as 
devices for summarising the operations of the independence grid (Fig. 87). When we lay out a 
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PACKING UP THE CHESSBOARD 


Raw Score Oi as a 
5-0 Saree 0, “A A 4 4 
4-fold Sample -bb Da -D 


Sums Frequencies 


Differences Difference as a Sum 


PGF. of (5+4)-fold Sample 
Row Score-Sum 
af at: at* ar” agr 


PGF. of Raw Score Difference 
wrt. 5-fold and 4-fold Samples 
at? at' ar it ae 


Dot: 
bt! 
b,t 
bit: 


abafonifanrfantfanr 
abtfondfancfandant 
aptfonrfesrlentfant 
abrfonrfanranrans 


Fic. 87. Probability Generating Function for Score-Sum and Score-Difference as a convention for labelling the 
cells of the chessboard. 


enfant ordonon 
fenaselanrabrfane 


grid with border scores 0, 1, 2... etc. with corresponding frequencies te, My, We = so 
and ©, °4, Gs = «> Sic, a fen shine score-sum entry is s == a + b is one whose frequency entry 
is Yas) == Us- Use; and the total frequency of s for the 2-fold sample is the right-left 
descending diagonal sum of all such cell entries, as set forth in the following schema for the 
score-sum (s) of unit samples from each of two 4-class universes with score range 0-3: 


0 1 2 3 
Uo Uy Us uz 
Ss = 0 Ss = 1 Es 2 << 3 
0 v Yoo = UoVo Vio = UV Y20 = Ugvo Y30 = Ugo 
go 1 ge 2 s=3 s=4 
1 wy Yor = U0V1 Vir = UU Y21 = UgUy Y31 = Usd, 
s=Z s=3 s=4 ¢=5 
2 Us Vos = UVa JY12 = U2 Y22 = UgUz Y32 = UgUg 
FER s=4 s=5 $=6 
3 v Yos = Uovs Vis = U1U5 Yes = UgUz Y33 = UgUg 


As they appear in the grid lay-out the frequency cell entries Y.(s-a) (= Ue. Vs- a) lie diagonally 
as on the left below. We can bring all terms of a diagonal referable to the same score-sum s 
into line vertically by sliding the rows as on the right : 


THE ALGORITHMS OF SUMMARISATION 467 


Yoo Yio Y20 Fs Yoo Vio Jm Ye +. 

Vo PE Fan ES AE a Ven Pa e 

Ji ue Nae ee AA E E A E 
Je Ju Ja Js ie e A ee 
eee ee a PL es da 
The formula for the frequency of the score-sum s is then 


T=S 


E => > Yrls—x) 
s=0 


This exhibits the result of applying the product rule as a lay-out on all fours with the familiar 
algorithm of multiplication. Indeed, the earliest commercial arithmetics set out gridwise the 
procedure for multiplication with the Hindu-Arabic namerals ; and indicated the diagonal totals 
at the margins. Each such diagonal sum is then a factor of the corresponding power of 10; 
and we can make the procedure more explicit by attaching to each border-score frequency y, 
a dummy factor t whose exponent is the corresponding score x. When we then apply the 
product rule, the cell frequency y, carries along with it a factor 1* whose exponent is the corre- 
sponding score-sum cell entry s. Thechessboard operation for the score-sum of any 2 independent 
samples then assumes the compact form : 


ut ttt uat? üst 


Vot? Voot® Vrot* Yaot” Yot? 
vt! Yot! Yul? Vail? Vail" 
Val? = York” Viel? Yt Yat" 
Vt? Yost? Vist* Yost? Vast? 


If the border frequencies in this lay-out refer to a unit sample distribution from different 
universes of 4 score classes, we define as a probability generating function of one or other universe 
each with the range 0-3: 


(uot? + ut + Ut? + ugt?) and (vot + vt! + va? + vt). 
We then specify a corresponding probability generating function of the 2-fold sample score-sum 
(PAYA VA Vo + Vytt Yeti + Yee 
= (ut? + ut + Uat? + ust®)(Vot® + vt? + vt? + vat’). 
If we take the samples from the same indefinitely large 4-class universe without replacement or 
from any 4-class universe with replacement, the above becomes 


Y,@ + Yt + Y + Ya + Yatt H Y + Yot? = (tot? + att + Uat? + ugt?)?. 
By successive application of the chessboard operation, the appropriate p.g.f. of the r-fold sample 
score sum is 
(Uy? + ut? + al? +- ua = Y O + Yue + EP: . .. Yad. 
In any of the foregoing expressions the coefficient (u+, Vz, Y) of # is the frequency of the 
score x. By inserting the dummy factor ¢ in the operation for deriving the score-sum distri- 


bution, we can pick out the frequency of a particular score-sum without invoking the chessboard 
lay-out. 
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Example 1.—A tetrahedral die has on its four faces 1, 2, 3 pips in the ratio 1: 2:1 as in Fig. 70 
of Vol. I. The unit sample frequency distribution is therefore +, 4, 4. Hence a p.g.f. of the u.s.d is 
s+ 4 + 42 = H+ 22 + £3). 


To obtain the frequency of a total score of five in a 3-fold toss we write: 
1 
le + 2t? + 1%) = pe + 61 + 1525 + 201% + 152? + 678 + £). 


The coefficient of të in this expression is 15.4 which is the frequency of the score-sum 5 or the 3-fold 
sample mean score of 3. 

In general, we may define a generating function (G,) of the unit sample distribution 
(Uo, Uy, Uz . . . Un) Of (n + 1) score classes thus 


E, =u but +t. ote ee 
Since u, = 0 for values of x exceeding n, we may write this as 
g% = 0 
Ges > ae id 
x=0 


If the frequency of a score-sum x, in the a-fold sample distribution with replacement is a, and 
that of x, in the b-fold sample is b,, the corresponding generating functions of the samples are 


Cie) = = af and Gi 
z= 0 pa 
The product rule for independence then signifies that the corresponding generating function of 
the distribution of the score-sum s = (*, + *p) 1s ; 
Gae, t = Gi) = Gog) DÍ DS a7... ; i ¿0 
z=0 
If we write G(s), G;(s), etc. for the p.g.f. of the score-sum of 2-fold, 3-fold, etc. samples G,(s)= Gz, 
G,(s) = G2. G,, = G3, etc., and in general 
Gis) = Gi and Ga; tan =—G,.G. i . (iii) 


00 
bf". 
0 


Example 2.—For sampling from an infinite 2-class universe G, = uy + u,t' = q + pt, whence 
c= 
Gus) = (q+ pt) = > Ya). .p”.t. 
z=0 


In this the coefficient of 1” is the familiar result for the sample-score x. 


Example 3.—What is the frequency of a mean score of 3-5 if we toss an ordinary cubical die twice ? 
The score-sum is then 7 and we have 


Gu = B+ GP + A H Gt HG; 
1 
G(s) = g A a 4+ 2 + #8) 
= a(t + 22 + 31% + 425 + 518 + 627? ...). 


In this the coefficient of £ is 6 — 36 = 4. 


* * * * * * * 
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If a, is the frequency of the score x, in the a-fold sample distribution, it is also that of the 
mean (or proportionate) score (x, + a), and we can adapt the foregoing definition of the p.g.f. of 
the score-sum to specify the mean (or proportionate) score frequency by substituting the dummy 


Tí 
factor ta for #, so that a, is then the coefficient of the ¢-term whose exponent is the mean score 
itself, ie. 


SS ee o 


x=0 


We should then write for the multiple toss of the cubical die 
eo 3 
G{s,) = gilt Se i a, ig NS 


We have so far assumed that our unit sample scores increase from zero by unit steps. If 
they increase from m by equal steps Am, we may interpret u, as the frequency of the score 
m + xAm in the p.g.f. 


a E rer o ae B EA 


In the a-fold score-sum distribution the minimum score will be am and the corresponding p.g.f. 


will be 
eee er a UE gt thm ete. . (vi) 


For the corresponding proportionate or mean score, the p.g.f. will then be 


2Am 


Am ¿Am 
Peas ns” A a 


(3) = apt” + at” 
a 


Example 4.—The scores on the faces of a tetrahedral die are 2, 5, 5, 8, i.e. 2 + 0(3), 2 + 1(3) and 
2 + 2(3) in the ratio 1:2: 1, so that 


2 2 
E ue + 245 + 18) = (P + 248 + 19) = (0° +8) 


For the a-fold sample score-sum : 
2a x= = 2a 
a 


Clea) = GE = FO + P =)" Y Caja”. 
z=0 


If the score-sum is 15 in a 3-fold toss, a = 3 and the relevant term in the expansion is the one whose £ 
exponent is (2a + 3x) = 15, so that x = 3, whence 


ee eae. 
Sig) O A 
Gy = Kako A 3131 64 16 


* * * * * * * 


We may now generalise an important result already obtained in Chapter 7 of Vol. I. We 
may define a binomial variate in the domain of representative scoring as such if the frequencies 
of score values m + xAm increasing by equal steps Am in the range m to (m + RAm) tally with 
terms of the expansion (q + p)*. 

Gu = (q + pie"), 
o. G(%q) = tq + pam and G(x, + xp) = ttmg + ptA™)e +a, 
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Thus the frequencies of the score-sum am + xAm in the a-fold sample tally with successive 
terms of the expansion (q + p)**. We may express this by saying that a binomial variate defines 
the distribution of the score-sum (and mean score) of samples of any size from a universe of which 
the u.s.d. is a binomial variate. If the variance of the u.s.d. is kpq, that of the a-fold sample 
score-sum distribution is akpg. Hitherto, it has been our custom to lay out the grid for the raw- 
score difference (d) of independent a-fold and b-fold samples, as below (a = 4, b = 3): 


0 1 2 3 4 


Uo uy Ug uz Uy 


d=0 d =2 d=3 d=4 


0 Yoo = UgoVo = Y20 = U2Vo ¥30 = U3V9 Yao = UgUo 


The rule for the individual cell entries y;; = u,v; is as before but the rule of diagonal summation 
for the particular difference d is different from the rule for the sum (s) being 
d 


j=4 tj 
>} Uait; = Yi= = Vidrij 
j= 


j=0 
If the maximum value of u; is u and that of v, is v the range of d is from — v up to + u, as from 
— 3 to + 4 in the foregoing schema. For which we may write the generating function of the 
difference as 


Y a? Yee Y + Y,f+ Yes Ye? VETA 
More generally, for the difference d = (x, — x,) of two independent variates 


d=u 
G(x, ors Xp) = > A 


=-—y 


Now we may write the difference d = (x, — xp) as a sum, viz.: d= Xa + (— %p). If, as Fig. 3, 
we lay out our row border-scores as negative values reversing the sign of the exponent of the 
attached dummy ż accordingly, we may define a new g.f. : 


G(— xy) c= Ra — A a nie? — Ad ra ey 
As before we write the g.f. of the column border-scores 


G(x.) = at + at! + a? + ag... etc. 
Whence we have 


Gr) Goa)= Y È te yl 


<. G(x) . G(— xy) = G(x, — %5) ; i (viii) 
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For the unit sample difference universe of (n + 1) score classes we may write for brevity 


r=00 % == 00 
G, = er ad Bu. s aed 
0 


ES =.0 


The distribution of the b-fold sample is obtained by expanding G% of which the terms are identical 
with those of G? „ if we reverse the sign of the exponent of the dummy ¢. If x, is the b-fold 
sample score from such a universe we may therefore write 


G(— a) = Gr. 
As before, G” is the g.f. of the distribution of the a-fold sample score (x,), so that G$ = G(x,) and 
ce +) = 6 9) — 05.62. 
If the samples come from different universes we may write this in a more general form as 
G(x, — #,) = GZ. G?, 
= (Uy + ut! + ut? . . JO) + ot + ve? .. .) ; (ix) 


Example 5.—For the distribution of the raw-score difference between a- and b-fold samples from an 
indefinitely large 2-fold universe 


G(x.) = (q + pt)"; G(— x) = (q + pt"); 
G(%, — x0) = (q + pt) (g + pt"). 
If a = 3, b = 2, we may set forth the operation as below : 
que e 3q*ptt +- 3qp’t? + pe 
PE + 2gpt + pèt- 
got — 3q*pt* + ag pr + err 
Zoot * =- 6q?p°t? a 6q"p*t! + 2qp*t? 
pr? + 3¢2p%t-1 + 3qp4t + prt 


Whence we derive the following distribution (left) of the score difference in agreement with the chess- 
board lay-out (right) below : 


0 1 2 3 
q 3q°p 3qp* p’ 
0 1 2 3 
Heo T ae 8 q 39*p 3q°p* grp” 
+2 3gp? + 2qp* 
+1 3gíp + 6p? + p’ | 0 1 2 
0 Y + arr + ap 1 2gp 2q'p 6q°p? 6q°p* 2qp* 
— 1 2p + 3p? PEA RO A A 
—2 gp? Y Sl 0 1 
? p? gp? 3q? 3 3qp* p 


The results summarised in (viii)—(ix) refer to a raw-score difference. They are easily adapt- 
able to the description of the distribution of a score deviation or to a proportionate score, since 
the only function of the exponent of ¢ is to label the score itself. ‘To make them do the task 
required, all we therefore have to do is to label ¢* so that x is in fact the score which is our 
concern. Thus we substitute #*~™ for t” in the expressions involved in (viii) if our concern is with 
the score deviation, and 1*/* for t° and t7*P for t7? if our concern is with the proportionate score. 
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For the proportionate score difference E —- z) of a- held and b-fold samples from an infinie 


2-class universe, we therefore have 


A Y 
o(* = *) = G -+ pe) ts + pt 5) i l : 0%) 
For the unit sample from an infinite 2-class universe M = p, X = — p when x =0 and 


X =(1-—p) =q when x=1. We therefore write the g.f. of the u.s.d. as 
SS ce EU Ep, 
G(s) = t- (q + pt)’. 


Example 6.—For the heart-score deviation of the 3-fold sample with replacement from a full pack 


a 4 (3 +2) 
9, ? t 
dal G+- 348 


O E e a 1 sos 
3= = — 7 4 4 4 €), 
Ge Ba 6 a 7t *-+ 27t+ + 9t+ + ti) 
Thus the distribution is 
re 1 5 9 
4 4 4 4 
Y A moo 


The rule for adapting the form of the p.g. f. to take account of Change of Scale and Origin 
is simple. Consider the following score distributions which differ w.r.t. scale and origin alone : 


Score A - m m+a m- 2a m+ 3a m + 4a m-+5a... 
Score B : ag q+b q + 2b q + 3b q + 4b q+5b... 
Frequency . — Uy Us Us Uy Us 
We may write 
00 00 00 
HA = 2 A ee 
0 0 


00 
GIB) = AA ps E E O 


0 


Thus the effect of multiplying the G, by 1"=* or 12—* is to change the origin from m to (m +, A) 
or from q to (q + k) as the case may be. We may reduce both expressions to a form involving 
the p.g.f. of the distribution with unit scale and zero origin by putting 


1 1 
fh t=h and Pag =p. 
We then have 


m 00 k © 
GA) =h* u,.h” and G(B)=2 > u,. g. 
0 0 


Symmetrical Distributions 


It 1s a property of symmetrical distributions that the distribution of the raw-score sum of 
a-fold and b-fold independent samples has the same form as that of the raw-score of the differ- 
ence from the same universe, i.e. the only difference between the two being referable to the origin. 
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Consider the following symmetrical u.s.d. of a universe of 7 score classes : 
Score mo m+q m + 2q m + 3q m +- 4q m + 5q m + 6q 
Frequency i Us Us Us lig = Uy Us = Uy lig = Uy 
The p.g.f. of the u.s.d. is 
6 6 
GC, = Sug — Y ut 
0 0 
= "ugt? + 4 + Uat? + us + Uatt + 41% + uot) 
= EM + Uut + ut + ott + ust? + ut? + ut + uy), 


For the distribution of negative scores we may write 


6 6 
es —(m+qu) __ ¿m —0Qx 
== > Ut =t U al 

2 
E MU Fu + ut RUE rt ut + gt) 
= ¿"+30 (yt + y t2 + ul + uot? + ut + t + ugt), 


In accordance with the product rule the p.g.f. of the raw-score sum s = (x, + x,) of independent 
a-fold and b-fold samples is 


A eel tt 0 de 
That of the raw-score difference d = (x, — x,) Is 
ete = pO ON Ot ES TN ete t, 


Thus we have 
Ce 06 (7). 

As we have seen, the only effect of the left-hand factor in the expression on the right is to change 
the origin of the distribution. This result is easy to confirm by recourse to the chessboard device. 
If the distribution is symmetrical, diagonal summation from left to right downwards is equivalent 
to diagonal summation downwards from right to left in the square grid ; and the student should 
be able to interpret the change of origin in terms of the distribution of the score deviations by 
drawing it. 


EXERCISE 11.08 


1. Acard pack contains only equal numbers of cards of the following denominations : ace of hearts, 
2 of clubs, 3 of spades, the player’s score being the total number of pips regardless of suit. Write down 
the distribution of the 3-fold sample score-sum and that of the difference between 2-fold samples on 
the assumption of replacement, and check the result by recourse to the chessboard procedure. 


2. Derive by means of the p.g.f. the frequency of the following mean scores for a 3-fold toss of 
the tetrahedral dice with faces as specified : 


Mean Score Faces of die 
4 5.5 
5 A ae 
10 2, 6, 10; 14 


Check the results by the grid procedure. 
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3. Specify distributions of the difference between both the total score and the mean score of the 
3-fold and the 2-fold toss of each of the dice in Example 2 above. 


4. By use of the g.f. involving the dummy factor ¢ establish the following conclusions w.r.t. 
sampling with replacement from a 2-class universe (e.g. red or black cards of a full pack) when p=3=q : 


(a) the raw-score difference distribution about the mean for a-fold and b-fold samples is the same 
as that of the (a + b)-fold score-sum ; 


(b) the proportionate score difference is identical with that of the raw-score about its mean. 


5. When p and q are not equal in Example 3, show that 
(a) the raw-score deviation difference distribution is identical with that of the proportionate score if 
the size of the sample is equal ; 


(b) the distribution about the mean of the score-sum for samples of 2a cards is the same as the dis- 
tribution of the sum of the differences between a pairs. 


6. For an infinite 3-class universe of score values —1, 0 and 1, write out the distribution of the 
3-fold sample mean score on the assumption that the ratios of the score class frequencies are (a) 1:1:1; 
(6) 1:2:1; (c) 1:4:1. Check by the chessboard procedure. 


CHAPTER: 12 


MODELS OF BIVARIATE UNIVERSES 


1200 MEANING OF THE MODELS 


ABOUT the turn of the century Karl Pearson adapted methods of line fitting prescribed by 
Legendre and Gauss for the evaluation of physical constants based on laboratory experiments 
to the description of concomitant measurements of relatives. ‘The ostensible end in view was 
to develop certain confused and erroneous beliefs about inheritance propounded by Francis 
Galton, in deference to whose mystique the Gaussian Method of Least Squares asserted its 
claims in a new context under a new name as regression. Pearson's new contribution was the 
announcement of a measure of association commonly called the correlation coefficient or more 
precisely the product-moment index. With its aid he claimed to have established on a firm 
footing Galton’s so-called Law of Ancestral Inheritance. 'This generalisation is meaningful in 
one sense which is true but too trite to have any claims to novelty or to utility. In any other 
sense, it is demonstrably false. | 

Through Bowley the new evangel of correlation spread to the social sciences, expounded 
against a background of geometrical concepts which defy any attempt to make explicit the 
manifold circumstances in which co-variation may arise. Not unnaturally the social sciences 
have therefore laboured under a load of misconceptions from which the revival of Mendel’s 
experimental method rescued the study of heredity in plants and animals. 'To make explicit 
circumstances relevant to a correct assessment of observed correlation is therefore a task of no 
mean importance. It is indeed a simple matter, if we examine different types of model 
situations. By examining one such class of models in Chapter 9 of Vol. I, we have seen that 
correlation in the statistical sense entails no necessary conclusions about the causal nexus involved 
in the events recorded. We shall appreciate this more clearly if we now take stock of some very 
diverse situations in which correlation can arise. Such is our chief concern in what follows ; but 
the situations we are about to explore may prove to be misleading if we do not clearly appre- 
ciate in what sense each model dealt with is a universe in contradistinction to a sample such as 
we meet in sociological or biological research. 

In our first approach to statistical theory it is appropriate to regard the structure of the 
universe (e.g. a card pack) as the source of our information about samples (e.g. hands at bridge) 
drawn therefrom ; but we have anticipated a different viewpoint in so far as we have found it 
(a) necessary to draw a sharp distinction between die or lottery wheel models and card pack or 
urn models in Chapter 2 of Vol. 1; (b) convenient to speak of the score-frequency specification 
of the universe as the unit sample distribution. We shall be better able to appreciate the lesson 
of the models dealt with in this chapter if we first re-examine our use of the terms universe and 
sample. 

From a static viewpoint we may usefully distinguish between universes of 3 kinds : 


(i) discrete and finite, if there is a finite number of score classes each with a finite number 
of identical score values, e.g. a full pack of 52 cards of which 13 (score value 1) are hearts 
and 39 (score value 0) are of other suits ; 


(ii) discrete and infinite, if we pool an infinite number of full card packs, in which event 
there is a finite number of score classes each with an infinite number of identical score 
values, subject to the understanding that the ratio of two such infinite numbers is 
specifiable and finite ; | 
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(111) continuous, in the sense that the number of score classes is infinite and the number of 
items in each class is infinite though not necessarily equivalent on that account. 


The last named is a convenient mathematical fiction which at least serves a useful purpose 
as a means of simplifying laborious computation with sufficient accuracy for practical purposes, 
e.g. when we invoke the normal curve to specify the distribution of large samples (Chapter 3 in 
Vol. I) from a 2-class universe of type (11) above. Likewise it is often a convenient device for 
specifying the structure of a discrete universe of the same type when the number of score classes 
is very large. Whether it is more than a fiction is open to philosophic doubt ; and we are on 
solid ground only if we confine our attention in this context to (1) and (11). 


If we define our universe as both finite and discrete, we cannot specify the results of sampling 
from 1t unless we agree at the outset concerning whether the sampling process does or does not 
involve replacement of each item chosen before taking another. If we impose the condition 
of replacement, the distinction between (1) and (11) ceases to be relevant from a mathematical 
viewpoint, since sampling with replacement from one full card pack is equivalent to sampling 
without replacement from an infinite number of full card packs. 

When our model universe is an urn or a card pack, we are always free to regard it as an 
entity in its own right on the assumption that we are free to sample one way or the other; but 
we cannot appropriately conceive the model universe of the die or lottery wheel in this way. 
By its very nature, such a model is a widow’s cruse. However often we toss a cubical die, it is 
still possible to score a six at the next trial. Thus, the structure of the model is such that we must 
in effect impose the replacement condition on the sampling process and therefore conceive the uni- 
verse of the model as a universe of type (11) ; but in making any such statement about our model 
universe we have changed our viewpoint. If we view a penny as a static entity, we are entitled 
to regard it as a 2-class universe with 2 score values ; but such a picture of the universe might 
lead us to wrong conclusions about its behaviour, if the penny were biased. ‘To visualise it 
correctly with that end in view, we must think of it as a universe zm action. We have then to 
conceive it as a 2-class universe with an infinite number of score values, the ratio of alternative 
score values being unity if the penny is unbiased. 

To say that we must so conceive it as a universe ¿n action is to say that we have reversed the 
more naive procedure of deducing the nature of the sampling distribution from the structure of 
the universe. We are now conceiving the nature of the universe in terms of the sampling pro- 
cess. This is the readjustment we have to make, if we seek to visualise sampling in what we shall 
later define as a correlation universe. 'To do this, let us recall a simple example of the class of 
model situations dealt with in Chapter 9. ‘The umpire tosses a coin twice, each of two players 
(A and B) toss once, each adding the umpire’s score (heads) to his (or her) own individual 
scores. We may summarise grid-wise the players’ joint score distribution in an indefinitely 
large number of trials as below (as in 11.01, p. 429): 


Xy 
0 1 2 3 
0 1 1 0 0 
1 1 3 2 0 
Xp 
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Each set of border-scores and the corresponding row or column totals (not shown above) of such 
a grid defines a univariate (single score) unit sample distribution, being thus a summary of 
the relative frequencies of recording each score value in an infinite number of trials. In other 
words, each border-score distribution defines a particular universe of scores. ‘The cell entries 
of the grid exhibit how often a score of one set turns up with a score of the other set in the long 
run. As such, they summarise the relative frequencies of the paired scores (Xa, X») in an infinite 
number of trials, each cell entry (divided by the grand total) being the probability of getting a 
particular paired score at a single trial. This is what we mean when we speak of a bivariate 
unit-sample distribution. Since it summarises the outcome of an infinite number of independent 
trials, such a distribution describes a universe from which we may extract samples of any number 
on the assumption that the particular paired score-value obtained at one trial (unit sample of 
paired scores) does not affect the paired score-value obtained at the next. To say that the 
universe is infinite in this context is, of course, consistent with saying that it contains a finite 
number of classes. In the same sense, we have seen that the universe of the common cubical 
die is both discrete and infinite. The number of faces defines the relative frequencies of six 
classes each with an infinite number of items available for withdrawal as sample values. 

Only a sample composed of 16 or some exact multiple of 16 paired scores could be speci- 
fiable by exactly the same proportionate cell entries as the bivariate universe of the bonus model 
mentioned above ; and this would be a very rare occurrence. In general, the sample structure 
prescribed by, say, 8 successive trials may be specified by filling in the cells with integers up to a 
total of 8 (or exact multiples of one-eighth up to a total of unity) with the proviso that certain 
cells whose corresponding theoretical frequencies are zero remain empty. In the above, these 


zero cells are, of course, defined by 0.2, 0.3, 1.3, 2.0, 3.0, 3.1. Thus 3 possible 8-fold samples 


are as below: 


Of many possible 8-fold samples one might choose, the above illustrate the possibility that an 
actual sample may ring the changes on values of 7,» from perfect negative through zero to perfect 
positive correlation, and hence bring into focus a twofold problem about a correlation universe : 


(a) what sample parameter is an unbiased estimate of the product-moment index fa» in 
the sense that (r — 1)s? = ro? defines the unbiased r-fold sample estimate (p. 304, 
Vol. I) of the variance (c) of a univariate unit sample distribution, i.e. universe of single 
scores ? 


(b) how can we define the sample distribution of 7,, or other characteristic parameter, e.g. 
Roa OF COD (Xi, £)? 


At this stage, we shall not attempt to answer these questions, stating them merely to em- 
phasise the importance of the distinction between 7a», etc. conceived as parameters of a correlation 
universe (unit-sample bivariate distribution) and as parameters of a particular sample of observa- 
tions. In the exposition of the models in this chapter our concern is with the specification of 
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DO O DD g) E) Al EI MI BI Ba 


PROPORTIONATE SIZE OF PILES OF CHIPS FOR PLAYERS' JOINT SCORES IN A VERY LONG SEQUENCE OF TRIALS 


BELOW - GRID OF RELATIVE FREQUENCY OF PLAYERS JOINT SCORES IN ENDLESS SEQUENCE OF TRIALS 


(unit sample distribution of the bivariate normal universe of scores) 


SCORE OF PLAYER A (Xa) SCORE OF PLAYER A (xp) 
A 2 3% 

SE 
x 

“5 — 

me o | 

o a 

x > 
< 

z d 5 

xd 

s o 

: E 

= 3 

Q 

(37) 


Fic. 88. The Unit Sample distribution of a bivariate universe. 


the former. The model may summarise the long-run outcome of all possible unit trials repeated 
indefinitely, and indeed (12.01-12.06) we can profitably conceive the structure of a bivariate 
universe only by such a backstage approach, viewing the summarising grid as a unit sample 
distribution in that sense. 

The grid lay-out itself is only one of several different ways of specifying such a universe. 
As we have seen in 11.01, we may do so for the universe of the model mentioned at the 
beginning of this section as follows : 


X ¿+ 00 OF EO 11 te ee S Sa 30 

A SE E A O A | A A S 

Y: TE 15.16 16 10 HAN 16 16 
The arrangement shown above has one advantage in that it falls into line with the lay-out of a 
univariate unit sample distribution. 'The circumstance which distinguishes the universe of 
correlation from the universe of independence is that the same numerical values of x, do not 


occur with equal frequency among different values of x, as they would if the distribution were 
the following : 
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HOMOSCEDASTIC DOMINO MODEL 
86S A AO A A A A 
al 


UNIT SAMPLE 
RIGHT SCORE (Xp) 


MM" MM. M- 


TOTAL Mr Vel 


LEFT SCORE (X,) 


aan ES : 


Fic. 89. A bivariate universe which has zero covariance and homoscedasticity (equality of variance within arrays) 
in both dimensions. 


It is important to recognise the implications of the fact that the two distributions set forth 
above each represent the distributions of independent unit samples, i.e. trials. When we speak 
of one as a correlation universe and the other as a universe of independence the distinction 
refers to the way in which the A-scores are distributed w.r.t. the B-scores (or vice versa) among 
unit samples of paired scores; but when we prescribe the relative frequencies of r-fold samples 
from a universe of either sort, we do so on the assumption that one trial is independent of another 
in conformity with the usual chessboard procedure. In short, we can visualise our bivariate universe 
as both finite and discrete, if we impose the condition of replacement on the process of sampling 
therefrom. 

We may suppose that the umpire has a large box of dominoes of 10 denominations, the two 
halves of each domino being respectively white with black pips and vice versa. Instead of writing 
down the score at each trial, the umpire may draw from the box a domino with the appropriate 
number of white pips on black for the total score of player A and of black pips on white for the 
total score of player B. As the contest goes on the pile of each of the ten types grows, and as 
the number of games becomes indefinitely large the proportions of dominoes in each pile become 
ever closer to the distribution assigned by the grid. From a formal point of view, the long-run 
result would be exactly the same if we recorded the result of taking dominoes with replacement 
one at a time from a box of only 10 dominoes of the same ten types in the same proportions 
(Fig. 88). This is a convenient visualisation inasmuch as it helps us to see one way of constructing 
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a distribution of which (Fig. 89) we can predicate both zero covariance and homoscedasticity 
(equal variance within arrays) without imposing the additional restriction that the two score 
distributions are truly independent in the statistical sense of the term. 

One feature of Fig. 88 calls for special comment because it introduces a device which will 
later (Chapter 18) steer us through a maze of difficulties in the theory of sampling from a bivariate 
universe. Since each of the dominoes, i.e. players’ joint score chips, of Fig. 88 records the result 
of a single trial, we can visualise the trial as a grid with a single cell entry in the appropriate 
row and column, otherwise constructed. like the summarising grid of the unit sample distribu- 
tion (u.s.d.). Since each trial is independent of another, we can operate with such unit grids in 
accordance with the chessboard lay-out, successively deriving the distribution of 2-fold, 3-fold, 
etc. samples by applying the product rule, at each stage specifying the composition of the 
(r + 1)-fold sample of frequency so defined by adding cell by cell the entries of the unit grid to 
that of the r-fold sample grid. Thus the following shows the generation of a particular 2-fold 


sample : 


poe 


EXERCISE 12.00 


1. Each of a pack of cards carries 1, 2 or 3 hearts on one face and 1, 2 or 3 spades on the other in 


the following proportions : 


lz iS > 2H IS £ 3H 1S v 
14-283 of 254 3H 2S w 
LA -95S oF 3S 34 35-538. z 


Examine the properties of the unit sample distribution for the following values p, q, etc. with special 
reference to the row and column means, the row and column variances, the values of 7), a and Tap: 


P q r s t u v vw 2 
Pao ee RE E aoe | 
E Ce AA O E.. 
Gris T O a Ta 


2, Work out the distribution of 2-fold samples for (a)-(c) above. 
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1201 The UMPIRE Bonus MODEL 


In Chapter 9 of Vol. I we have examined a type of correlation which arises between the scores 
(x, and x,) of two players A and B in a game of chance, if each receives a variable bonus (x,) 
from a third player called the umpire. In Chapter 18 we shall rely on the same model to 
clarify the elements of the statistical procedure called Factor Analysis. For that reason, 
we shall now examine it in more general terms. Fig. 90 exhibits a simple variation of the 
model, zzz. : 


UMPIRE BONUS MODEL 


JE 
elt] 318] a 


M(Voo) = 355 V (Ma) = 335 M (Vo) = 155 V (Mpg) = + 


‘cn tae BYA M (Ya) + V (Moy) = 1; M (Vig) + V (Mya) = iz 
Cov (AB)= $ = y, 
À A >= 
Mat: L a 
35 , Vy T 
Player 8 
(one toss) 
Y 
Be 
A 
oo D 
SCORES FREQUENCIES 


Fic. 90. Umpire Bonus Model—linear regression in one dimension only. The umpire tosses twice the flat, 
circular die of Fig. 67, Vol. I. Player A tosses once the tetrahedral die of Fig. 70. Player B tosses once the tetra- 
hedral die of Fig. 73. 


(i) The umpire tosses twice the die of Fig. 67, Chapter 7 ; 
(ii) Player A tosses once the die of Fig. 70, Chapter 7 ; 
(iii) Player B tosses twice the die of Fig. 73, Chapter 7. 
In such a situation we note that the scores (x.o and x,.,) of the individual players A and B 


before addition of the bonus are strictly independent of one another and of the score of the 
umpire. We express this by the equations 


oy Ae ys OR ARE Ay h : E) 
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We may speak of (i) as a law of linear concomitant variation in contradistinction to a situation 
in which the law is curvilinear, e.g. cubic, as when A receives a bonus x, and B receives a bonus 
xè, Le. 
s =a Ex. md amn ae ; : : +5) 
By recourse to constants (Au, Ao, Bu, Bo) we may subsume both (1) and (11) under a more 
general expression 
a = Ayx? + Aja... and x, = Byxi + Botb.o 0) 
When p = q we may speak of the relation as linear, (i) being a special case arising when $, q, and 
the constants A,, etc. are all equal to unity. Thus we might devise the following game in which 
the players’ chance of success is equal and the law of concomitant variation is linear : 


(i) The umpire tosses twice the flat circular die of Fig. 67; player A tosses twice an ordinary 
cubical die and player B tosses once the tetrahedral die of Fig. 70, so that the mean scores of 


_ 


the umpire and individual players before receipt of bonus are respectively M,, = 3, Ma . o = 7 
and M, -0 == 2 , 


(ii) Player A counts as his final score 4 times that of the umpire added to twice his individual score, 
B recording as his final score 6 times that of the umpire added to 4 times his own individual 


score. 
In this case the law of variation is 
x, = Ax, + 2x0... and. x, = Gx, + 4%. >. 
For the means we have 


M, = 4M, + 2M;.. = 26 = 6M, + 4M,., = My 


Let us now derive the product-moment index appropriate to (iii) when p = 1 = 4. We 
may then write 


M, =A, .M, + 4..Ma.o and M,=B,.M,+B,.Mb.. ° . (iv) 
Vis UV EN a AN ¿ 2h 
From (iv) we have 
Xa — M, z= AL g Ma) = AÑ > 33 Wa oh 
Xy — M, q. Bula Es Ma) = Bio. o Zoe M, oe 
In conformity with the convention used elsewhere we may write these in terms of score 
deviations as 


X,=A,.X,+A,..Xa.o and A aR A + By. Beg ee (vi) 
soa X Ap — A,B, X? + AB . X>. 0 + AB, AX . P a 0 = AB As. O° ae 0) 
EX) = AB, HAIFA E Ara 
= A,B, EX, . Ai 5 <E A,B, E(X4q. 0° At. ae 
Cotes a) AB, Va AD, 200 UNI 
+ A,B, Cov (Xu Xa. 0) + AoBo Cov (Xa. o Bea 
Since Xa. o, Xo. o and x, are independent, their covariances are zero, and 


Cow Wit) = A Bele 
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When A,, = 1 = B,, this reduces to Cov (Xa, x,) = V,, as we have seen in Chapter 9 of Vol. I. 
For the more general case we may write 


Bt Sey Bis 


tay ee (vii 
; V E V, Og On ) 
To determine the covariance of the players’ score and that of the umpire we note that 
Hicks ex: AO F A 2 = ALAS + Ape 0) 
-C = TX?) + ARA XX 0) = Aa Y 
Whence we obtain 
A 
fe ore 22. 5 . (vii) 
Cuts On 
Similarly, we derive 
‘ou = Be. 
Op 
*s Fou: lou = p a Pa = Tab » é EnS $ e (ix) 


Ca Op 


In connexion with the derivation of the equation of partial correlation and the meaning of the 
“ reliability ” coefficient in 18.06 below we may usefully elaborate our model, as in the following 
examples : 


Example 1.—Umpire U tosses an ordinary cubical die once. Umpire W tosses a coin twice 
(scoring heads as 1 and tails as 0). Player A tosses the tetrahedral die of Fig. 70 (scores 1, 2, 3 in the 
ratio 1:2:1). Player B tosses once the tetrahedral die of Fig. 73 (scores 1 and 2 in the ratio 1 : 3). 
The final score of A is twice his individual score added to 3 times the score of umpire U and four times 
the score of umpire W. The final score of B is three times his individual score together with twice 
that of umpire U and 5 times that of umpire W, i.e. 


Xa = Shy + 4%, + 2%. 03 
Xp = 2x, + 5x, + 3x, . o 
Example 2.—Umpire U tosses a cubical die once. Player A first tosses the die of Fig. 67 twice, 
and then tosses once the die of Fig. 70, counting as his individual score 4 times the result obtained with 
the circular die added to the result of tossing the tetrahedral die. Player B tosses the die of Fig. 73 
once, and adds to 5 times his score the result of tossing a coin twice, the sum being his individual score. 
The final score of A is three times that of the umpire added to his individual score. B adds to his 
individual score twice that of the umpire, 1.e. 
Xa = 3x + Axu + Kae; 
Xy = Wy + Ops T Xoe 
In Example 1, we have two independent umpires, and we may write the general pattern as 
MES A + A Ba F Bo E Bo. (A) 
In this set-up 
Mo z5 AM + AgMy + AM M, = BM, + BoM, + BM». 0% 
V, Si AV u Sa aot. y A : V, = BY y 5 BY w EN LR 
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Hence we may write 
Xa = Aj Xy + AyXw + AcXa.03 Xo = BuXy + BuXy + BX... 
Cot lto 1) = HA, Ay) 
= A,B, E(X2) + ApBuE(X2) + (AyBo + AuBJE(Xa - Xu) 
+ A,B E(Xu . Xp:0) + AwBE(Xw . Xo.) 
EA Ba) + ADDS. AS 
+ A,B, E(Xa.0-Xv.0)s 
¿e Coo (8p to) = ABV A 
. fas = Ay. By + Ago: Bu ee 


Similarly we get 


Cob (x, %) = AV, ad Cae) = AV 


O O 
Sa Ta 
Likewise 
Oy, Ow 
Tia = By and {tee Dga 
Op Op 


Whence we obtain 


a o? 
u w 
Yau > Tou a Taw Tow = A,B,— + AB > 
OaTp OT 
AO Pe, O E ARE PP ; é . (xu) 


We may accommodate negative correlation in the simpler set-up defined by (vi) if A, and B, 
have opposite signs, as when A adds the appropriate bonus to his individual score, and B deducts 
it therefrom. The covariance will then be negative. The set-up defined by (x) admits the 


possibility of zero covariance, if A,B,V,, = — A yByV» in (xi). If the umpires toss the same 
die the same number of times, so that V, = V, zero covariance therefore implies that 

Ha oo 

He RR Be 


The orthogonal lottery model of Fig. 13 in 12.07 below is a particular case, viz. 
A =B.=t=As By = += Ae 


In the jargon of factor analysis, Example 1 illustrates a situation involving two scores sharing 
two common factors. The second example illustrates a situation involving a common factor 
and two independent specific factors. Example 2 is of the general pattern : 


Rg As HARO, PARAS and AA F BG HB . (xiii) 


The reader will easily see that 
Cou {x,, 2) = AB : E i i . 0 


V, = AV, + A. + Ai; and Ko = BV, + BV. BA. 
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An important generalisation developed in 18.06 below emerges from a modification of this 
pattern when the player plays against himself by repeating at each trial his prescribed number of 
tosses with the third die, and adding the result of the single toss of the first and of the second to 
each. If the player (A) does this, he has at each trial 2 scores : 

a Sas Bahr, Sa Agha. ET AP 
Xg = A > Aske: or E a: 98 
This conforms to the pattern of Example 1 with the special condition that V,., = V,=V.,.>», 
and we may write 
| Cov (%,, x2) = AZV, + A?V,, 
EER Ar. == AV, oe AV, Bs V, . . . (xvi) 
Whence we have 
a AV, + AV, 
© AV + ABV, + ABV, 
Our numerical examples in Chapter 9 of Vol. I have shown that linear regression in neither 
dimension of the grid is a necessary consequence of linear concomitant variation in the domain 
of the concurrent relationship between the scores of the two players when Example 1 defines the 
set-up. Contrariwise, they also show that there is always linear regression of the player’s score 
on that of the umpire. To show that this is always true, we shall first consider the case which 
arises when: (a) the player’s individual score and that of the umpire each increase by unit steps 
from zero upwards; (b) the definitive equation of the A-score is x, = Xu + Xa.o For this 
set-up Cov (xq, Xa) = V, and 


(xvii) 


Ps Ca 


Oy Oy 


Tou 


If regression is linear, the identity defined by (vii) of 11.04 then implies that 


u a 
Ron =Ya— = 1 . Ñ ; : E . (xviii) 

u 
Let us first recall the build-up of the correlation grid for the concomitant score distribution of 
player and umpire by reference to a simple example, viz.: the player Æ tosses a penny twice 
and the umpire four times, the player’s total score being the sum of the player’s individual score 
and that of the umpire. If p is the probability of scoring a success in a single toss, we may 


denote the distribution as follows : 


Score Frequency 
E Player's 
hl ae individual score 
0 Uy = ql a = g’ 
l u, = 4pq° a, = 2pq 
2 U, = 6p*q" a, = p* 
3 u, = 4p% ii 
4 u, = p* 


For a fixed umpire score of 0, 1, 2, etc. the total A-scores run (0, 1, 2), (1, 2, 3), (2, 3, 4), ete., 
or more generally for a U-score of r the range is from r to (r + 2) with weighted frequencies 
Uy. Ay, Ur. a, and u, . ap The frequency grid is therefore as overleaf. 
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A’s Total Score 


0 1 2 3 4 5 6 
0 gl 2pq° pq’ 0 0 0 0 
1 0 4pq? 8p?g* 4p q* 0 0 0 
| Umpire’s Bonus 2 0 0 6p*g* apgr 6p*g? 0 0 
3 0 0 0 4p3g? 8p*g? 4p°q 0 
4 0 0 0 0 pq 2p*q p° 
| A-score frequency q! 6pq° 15p*g* 20p*g* 15p*g? 6p*q p? 


When the two sets of independent scores both increase by unit steps from zero origin and 
the player’s total score is simply the sum of his individual score and that of the umpire, we may 
generalise as below the preceding pattern to accommodate situations involving the use of different 


dice by the player and the umpire : 
A’s Total Score 


0 1 2 3 + 
0 UNA Ugi UA, UpAs UA, 
Y 
Score 2 — — UA, usa, UA» 


If the player’s individual score increases from 0 to z by unit steps, its mean value is 


M, = a(0) + aff) +42)... ato) = Y a. 
By definition also a 
Ay ta,td,... acD ao 
For the rth row of the grid the mean A-score (Ma. ,) 1s = 
M _ Uy» a(r) + u, . alr + 1) +u, . alr +2) UI 
TE U, -aa F upt Fir lan o i 
= a(r) + alr + 1) + adr +2)... adr +2) 
5 
= i : : > ; P f : 2 -- ¡5000 


In this expression r is the umpire's score and M,., is constant. ‘Thus the mean A-score of a 
row is a linear function of the border U-score. To arrive at this conclusion we have assumed 
that (a) both the score of the umpire and the individual score of the player increase by unit steps 
from zero origin ; (b) constants A,, and A, are each equal to unity in the definitive equation 


y= AL X% Ay ee 
We shall remove these restrictions by postulating that 


(a) the constants A,, and A, may have any value ; 
(b) the increments Ax, and Ax,., may have any constant value ; 
(c) the origin of the distributions of x, and x,., may have any value m, and m, respectively. 
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To make use of the foregoing schema, we then label the frequency of the player’s individual 
score as a, when its value is m, and a, when its value is (m, +c. Ax,.,). Similarly, u, is the 
frequency of a U-score equal to (m, + 7. Ax,). In this symbolism a cell frequency tp. a, is 
that of a player’s total score : 


A,(m, + r. Ax,) + A,(m, + c. Axq. 0). 


For brevity, we may write 


fg ARA RAR REA Mos .. i A) 
Thus the cell frequency u, . a, defines a total player’s score equivalent to 
A A PA A ; i 5 At) 
The mean A-score for a fixed value of the umpire’s score (x, = r) is then 
c=z e=z c=2 
MS <a E See, = > IR 
c=0 ec=0 c=0 
c=2 c=2 
=XKo. 5 2 ee > Ae. C, 
c=0 c=0 
c=2 
A Nie = Pee, + SN OA 
c=0 


When the origins of both distributions are zero we may write for the mean of the player’s 
individual score M,., as elsewhere. Otherwise, the symbol is ambiguous. Here we shall 


write it as M,, so that 
c=2Z c=2 


M= 2 meet e AR Me + AX. > ae 


c=0 c=0 
By substitution in (xxiii) we now have 
Ma. r = Ke., + Ao. My — Ay. Mo. 
Whence from (xxi) above 
M,.,=—A,.m, + A,.M,+ A, . Ax, .7 . (xxiv) 


In this expression every term except 7 is a constant of the distribution involved; and M,., 
the mean value of the player’s total score for a fixed value (x, = r) of the umpire’s score is a 
linear function of r. 


The Redundant Umpire 


We shall later see that the umpire bonus model embodies the statistical postulates of the statistical 
technique known as factor analysis. In terms of our model, factor analysis seeks an answer to 
questions of two kinds: (a) how many umpires must we invoke to explain the inter-correlations 
of the scores of a team of players ; (b) what is the proportionate contribution of each umpire 
to the variance of the distribution of the scores of each player? Here we may foreshadow an 
answer to (a), leaving the discussion of its practical importance till later. 

From a factual point of view there is a clear-cut distinction between a set-up involving the 
contribution of 2 umpires (U and W) and one which involves a single umpire (Z) ; but the two 
situations may be algebraically identical, as is easy to see when: (a) the umpires U and W 
respectively toss the same die u and w times; (b) each player adds the actual score of each 


488 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


umpire to his individual score, so that A, = 1 = B,, etc. and A, = 1 = Bu. In these cir- 
cumstances we may denote the mean and variance of the unit sample distribution of the die 
of the umpires respectively by M, and by V,, so that M,=u.M,; V,=u.V, and 
My, = 0 M; Ve =w. V; whence 

M, + Mu = (u + w)M, and V, + Vuo = (u + w)V 
The covariance of the joint distribution of the players is therefore the same as if a single umpire 
Z tossed the same die g = (u + w) times, since then 


V,=2.V, = Cov (x, %) = (u + w)V, = V,+ Vo. 


Let us now remove the restrictions imposed above w.r.t. the nature of the dice and the equal 
allocation of the bonuses. For simplicity, we may take the origin of all score distributions as 
the mean, so that replacement implies that 


AX AKo t Aa =X = AX, + AX, 7 
BX, + ByXy + BX). =X, = BX, + BX. - i . (xxvi) 
For such a system we may write 
AV a + AV o + AWVa.o = Va = ANV, + AV a. or 
BV a + BEV + BEV. o = Vy = BEV, + BiVo..o, 
co AV ¿+ AV, = AV, and BY, +B2V,= By] . (xxvii) 
ABN SEAN y ANA a) = A BY i . (xxviii) 
Let us now write 
AVe = K,A2V,; B2Vy = KBV, -ABV = KAB . (xxix) 
From (xxvii) we then have 
AX1 + Ka) Va = A&V, and B(1+ K,)V, = BY, 
-~ AyoyW(1 + K,)=A,o, and Byo,V(i+ K,)=Bo, . . (xxx) 
+ ABV, = A,B Vu V(I + KI + Ko). 
Also from (xxviii) and (xxix) 
A,BV, = A,B,V(1 + Kap). 


Hence the condition of replaceability of the two umpires by one umpire without affecting the 
intercorrelation of the players’ score distribution is 


LY 
This must be an identity if K, = Ka» = K», in which case (xxix) implies that 


ALY, EBV ee 
RY, BV 


Thus we may say that one umpire Z can replace the two umpires U and W if the multiples 
of the two umpires’ scores respectively allocated to player A and player B are in the same ratio. 


MODELS OF BIVARIATE UNIVERSES 489 


When this is so (xxx) and (xxxi) imply that 


AS AN Ba UL / 
A Kaa; B=-2 (1 + Kap), 


Oo 


Ag= 


Ae ge ae 3 
A ek a 
m o oe Do 


EXERCISE 1201 


1. As in Example 1 above, umpire U tosses an ordinary cubical die once. Umpire W tosses one 
coin twice (scoring heads as 1 and tails as 0). Player A tosses the tetrahedral die of Fig. 70 (scores 1, 
2, 3 in the ratio 1 : 2:1). Player B tosses once the tetrahedral die of Fig. 73 (scores 1 and 2 in the ratio 
1:3). The final score of A is twice his individual score added to 3 times the score of umpire U and 
four times the score of umpire W. The final score of B is three times his individual score together 
with twice that of umpire U and five times that of umpire W, i.e. 


Ca = 3x, + 4x, + 2%,., and xX = 2%, + Ox, + 3%. o- 
Check the following results : 
MARIS: V, = 36-25; M, = 17:25; V, = 258542; 
Cou(x,; %,) = 279 ; Tay = 0898. 


2. As in Example 2 umpire U tosses a cubical die once. Player A first tosses the circular die 
of Fig. 67 twice, and then tosses once the die of Fig. 70, counting as his individual score four times 
the result obtained with the circular die added to the result of tossing the tetrahedral die. Player B 
tosses the die of Fig. 73 once, and adds to five times his score the result of tossing a coin twice, the sum 
being the individual score. The final score of A is three times that of the umpire added to his individual 
score. B adds to his individual score twice that of the umpire, 1.e. 


Xq = 3x, + 4x,, + Cae and x, = 2m, + Sx, + X%e- 
Check the following : 
M 2455 ) = .34-75:; Mi = 16-75; V, = 16:35; 
Cou (2, 2.) = 17-5; 7,, = 0°74. 


3. For Example 1 determine the variances of the score distributions of both umpires and express 
their relationship to the covariance of the score distributions of the two players. | 


4. Show that the variance (V, , ,,) of the player’s score distribution is the same for all values of the 
umpire's score when the definitive equation is x, = A,x, + A,%a . o without restriction on the origin or 
scale of the distribution of either component. 


5. Check the formulae for partial correlation given in Chapter 9 by withholding first the bonus 
of one umpire, then the bonus of the other. 


6. Examine the effect on the set-up of Example 2 on the covariance of the score distribution of 
the players and of the value of the correlation coefficient, if 
(i) Player A tosses only the die of Fig. 70 and player B tosses only the die of Fig. 73. 
(ii) Player A tosses only the die of Fig. 67 and player B tosses only the die of Fig. 73. 
(iii) Player A tosses only the die of Fig. 70 and player B tosses only the coin. 
(iv) Player A tosses only the die of Fig. 67 and player B tosses only the coin. 
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Compare the results with those of Example 2 with special reference to the variances of the score dis- 
tributions of the players, and interpret them. 


7. An umpire tosses a coin three times scoring heads as successes. Player A takes three cards 
with replacement from a full pack counting the number of hearts as his individual score. Player B 
tosses a cubical die once. Player A records as his final score the result of deducting three times the score 
of the umpire from twice his individual score. Player B adds to his individual score twice that of the 
umpire. Investigate the joint distribution of the scores of the two players and that of each player with 
that of the umpire. 


NON- REPLACEMENT MODEL 
from a full pack 


Frequency Grid 
Subsequent 2 -fold Score Products and Frequencies 


draw of Piayer B, yA B SCORES 


8 2 
RENE ERES 
| 


AQ 


Initial 3- fold 
draw of Player A 


Fic. 91. A Non-replacement Model. Player A takes 3 cards from a full pack without replacement and player B 
then takes 2, each recording as his or her score the number of hearts in the sample. 


1202 The NON-REPLACEMENT MODEL 


Whether we score by the taxonomic or by the representative method distinguished in 7.01 of Vol. I, 
the withdrawal of a sample without replacement limits the possible score value of a sample drawn 
subsequently from the same universe. If one player takes a sample containing all the highest 
cards, a second player must evidently have a lower mean score than otherwise. Hence there 
will be a negative correlation between the players’ scores. Correlation of this sort is the theme 
of the models whose properties we shall now explore. A numerical example involving the 
taxonomic method of scoring will clarify our task : 
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Example.—From a 6-card pack consisting of 2 clubs and 4 hearts, the first player (4) simultaneously 
draws two cards and the second player (B) draws three from the residual pack of four. A’s heart 
score (0, 1 or 2) distribution is given by successive terms of (2 + 419/60, viz. 


A score << è : TE.. 1 2 
Frequency (X 15) ; eo o- 9$ 


The residual packs from which B draws are as follows : 


A’s heart score ; k ca 1 2 


Residual : 
hearts . : : ee 3 2 
clubs . à ; =W 1 2 


If A’s heart score is 0, that of B is necessarily 3 for the 3-fold draw. If A’s heart score is 1, the dis- 
tribution of B’s heart score of 0, 1, 2 or 3 accords with the terms of (1 + 3)(/4@), If A’s heart score 
is 2, the appropriate binomial is (2 + 2))/4@), Thus we get a frequency table : 


B’s heart score 


when x, = 0 


© es 


e 1 a ‘ i i 
. 2 h ; i T 


ve O O m 
ni peo O bo 
One H 09 


To obtain the correlation grid we have to weight the above by corresponding frequencies of A’s score, 
and obtain 


Mean 


From the above we obtain 
r= 1645. 7,— 2/5; 
Roa = — 3/43 Ray = — 2/8; 
Cov (Xa, Xp) = — 4/15 = kya Va = RayVs ; 
Tap = — 1/V2. | 
We shall later see that 7,, depends only on the sampling fractions defined by n.f, = a and 
n.f, = b, and is in fact (f,:f,)/(1 — fa 1 — fo) = 723. In the above example, 
fa = 2/6 = $; fo = 3/6 = E; 
oe (Ja JO! — fa)(l So fo) = $ = 1. 


The Two-class Universe 


When we employ the taxonomic method of scoring as in the foregoing example, we may make 
use of relations we have already obtained in Chapter 3 of Vol. I. For illustrative purposes we 
assume that A and B each records the number (x, or x,) of hearts in the sample, the proportion 


a? 
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of hearts in the full pack being p = (1 — q). For an 7-fold non-replacement sample, we have 
obtained the mean and variance of the score distribution on pp. 102 and 138 of Vol. I, viz. 


Sao A 


op and Ea ge 2 sh ‘ ; ; 3 


If A had already drawn a cards of which x, are hearts from the pack containing mp hearts in all, 
the universe from which B draws a b-fold sample contains (n — a) cards of which (np — xa) 
are hearts, so that the mean score of B for a fixed score x, of player A is given by 


Moa Fan Ey. a( Xp) eS 


b 
(np zi Sak 


n — 


b i 
im Ms dape M, E ES qa SE Ma) . . . . . (11) 


Thus regression of x, on x, is linear, and by analogous reasoning the converse is true. For that 
of x, on x, we may write the regression coefficient as 


b = 
Ria = — . . . . . s (iii) 
n—a 
From (i) we have 

V, = E — Dapa : : ; ; i . (iv) 

Since regression is linear we may obtain the covariance directly from the relation 

= ab 

Coula t) = Mag a = — pq i è : > d : e 


Alternatively, we have 
Cov (Xa, Xy) = E(x, X) — MaMy = E(x. Moa) — abp?. 
In this expression 
b. x (up — Xa) nbp: x, ¿0 


Xa . Moa E NIT ==, 
t= & == AE n Y 
nbp. M b 
zie ; Ani ace ee: E (x? 
E M ya) Lea PE aae 
2 
— 2 — 


*--= q AA 


_ nabp*  a?bp? ab 


Ha. Hs. ees 
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Whence as above 


— ab 
Cov (Xi Xs) = ae 


To obtain an expression for 7,, we need to evaluate V,. If we denote the variance of the distri- 
bution of an (a + b) fold sample score (xg + x,) as Va+ 


Vaan = Va + Vo + 2 Cov (Xa, Xp). 


From (i) above 


—a—bla+b 
Vos» os ds a E a 


Whence we have 


ER (n — a — bla + ale ae = a)apq + 2abpq 


n—l . n— 1 n—I 
b(n — b 
aD 
n— 1 
spo b(n — — b 
TV, Vab(n — a)(n — b) . pq, 
n— | 
ab > 
$ A ee 3 ; ; : = en 
If we use f, and f, for the sample fractions as above 
fa-fo = 
Tab = — E i j ; : . (viii 
ON TIA) a 
In particular, when a = b so that f, = F = fi 
ES 
ae ep: 
When F = 1, fa» = — 1, since there is only one sample which B can take for any one sample A 


has already taken. 

From (i) and (vi) we see that the variance of B’s score distribution is exactly as it would be 
if A had not previously drawn'an a-fold sample. This conclusion is at first surprising, and 
we may arrive at it by an alternative route. In virtue of the fundamental tautology of the grid, 
we have 

V, = ViM,. a) + MV 5 a) - : ; ; ete 
Also, in virtue of linear regression 
ab? . pq 
Mo.) =k, - Va = 77 i ; 

V( b a) ba Vo (n ia Dn E a) (x) 
In (ix) we may write 

MV) == EM. 
The variance (V,.,) of B’s score distribution for a fixed value of x, is the variance of a b-fold 
sample score distribution from an (n — a) fold universe in which the number of hearts is (np — xq). 
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So we may write 


b(n —a—b) (np — x.) (n—a—np-+ xa) 
Vo. a = 5 _EIIA«Smmm>A>A2A 


a aman m i in 


-etm A n—a 
P 
E E | sl Xa), 
b(n — a — b 
E Va. a= p p lipa — anp + (2np — n + aia — a 
b(n — a — b) 


n MV. a) = o a Ep — p — anp + (np — n + aja — så) 


In this expression E,(x2) = V, + M2, whence 


E,[n*p — n?p? — anp + (2np — n + a)xa — x2] = pq(n — a} — Vo, 


: __ b(n — a — b)pq a ab(n — a — b)pq 
ty AA ad = n—a—1 (n — an — a—1)(n— 1) 
Whence from (ix) and (x) 
AE er ae see 2 
Vi b(n — a — b)pq | ab(n — a — b)pq Ai ab? pq 


n—a— 1 (n — an — a — 1\(n— 1) (n—a)(n — 1) 


Hence in agreement with (vi) : 


bah) 
pape 


Representative Scoring 


We shall now assume that the pack contains no picture cards. Accordingly we may score the 
sample by adding the number of pips on each card, e.g. the score of a 3-fold sample consisting 
of the ace of clubs, the 3 of spades and the 7 of hearts will be 11. Whether the pack contains 
equal numbers of cards with the same number of pips is immaterial. 

| As before we assume that player A draws a cards from the full pack of n cards without re- 
placement and player B draws b cards without replacement from the residual pack of (n — a) 
cards. We shall denote the total score (score-sum) of player A by s, and that of player B by s;. 
We may assign to each individual card a rank referable to the order in which the player draws it 
if he draws the cards successively, or to their place in a face upwards sequence if drawn simul- 
taneously. We may then label the score of the wth card drawn as x,. The subscript has nothing 
to do with the numerical value of the score, its function being to label the card as a removal, 
i.e. one which the residual pack does not contain. By definition, then 


“=a u=b 
A a 2o : : E . ae 
u=1 u=a+1 


If we write the second zero moment of the two score distributions as u24 and uo» the correspond- 
ing variances are 


V.= pu ~M and Mp» ME. —.. A 
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In these expressions 
Ma, = Els.) = E(x, + Xa + %3 . . . Xa) 
= E(x) + E(x) . . . + E(x.) . ; i i . (xiii) 
Moa = E(k) = E(x, + %2 +43... Xa)? 
= E(x?) + E(x2) + E(x?) . . . etc. 
ZE (0, 2) + ZE(x,. Hy)... ete. o: i à 10) 


To evaluate M, and po, we must first define E(x,) and E(a?), i.e. the mean value of the unit 
score and of its square regardless of the order of choice. With this end in view, we shall denote 
the score-sum of all the cards in the n-fold pack by S,, and the sum of the squares of all the in- 
dividual card scores as S,,, so that 


u=n u=n 
S eo ee ; s ; AN) 
u=1 


u=1 


For the corresponding score-sums of all the cards in the residual pack of (n — p + 1) cards 
after extraction of a (p — 1)-fold sample, i.e. immediately before extraction of the pth card, we 
may use S,_ 942 ADA Data = 55 1), 80 that 


uU=n 


uU=n 
oat ee 2 ‘ 
ee > Ne AA ay = > x ; : = 2 A) 
“=p u =p 


Let us now consider the result of two consecutive draws, the pth and the (p + 1)th. Since the 
player takes a card from a residual universe of (n — p + 1) cards at the pth draw we may write 
the mean score drawn as 


E(Sn— 2+1) 


A T 


The operation E here refers to all possible values p may take but involves no restriction on the 
residual universe other than the fact that it contains the card whose score value is x,. Hence 
we may write with equal propriety 


E(e,) = Eo(xy) = Enos) a ee 


If we write q = (p + 1), the operation E,. p signifies taking the mean value of the score at the 
qth draw after removing the particular card whose score is x, at the preceding draw, and this is 
equivalent to finding the mean score of a unit sample from a residual pack of (n — p) cards whose 
score sum is S,_»41— Xp Thus 


Oe pace Ea 


E, . p(Xa) c n—p 
The mean value of (x) is the weighted mean of the above for all values of pth draw, i.e. 


E(%,) = Ey. Ey. o(a) = erent oe ee 
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Whence from (xvil) 


*. E(x,) = E(xy), 
v. E(t 41) = E(*%p). 


Thus the mean value of the card drawn is independent of choice and therefore E(x,) = E(x,). 
If we denote the mean value of the first card drawn from the full pack as M, = E(x,), we there- 
fore have 


M, = E(x) + E(x) + .. . E(xa) = aM, . : i . (xviii) 
Similarly, M, = b . M,. If we substitute x}, x7, and San—»+1m for Xp, Xa and S,_»,, in the 
foregoing argument and write E(x?) = po, we derive in the same way 


Se : 
Ela) === > A 


To complete the evaluation of V, and V, we also need to determine the mean value of 
the product terms in (xiv) above. If we write x, . Xw as the product of any two unit scores of 
the sample, its mean value is the weighted mean of the product of x, and the mean value of x, 
for the same fixed value of x,, i.e. in the symbolism of 11.01-11.04 : 


To interpret the meaning of the operation Ew.„ we recall that the only restriction implicit is 
that the residual universe does not contain x,, and we have already seen that the mean value of 
x» does not depend on order of choice. Hence £,,.,, means taking the mean value of x,, from 
a pack which does not contain x,, i.e. from an (n — 1)-fold residual universe of which the score- 
sum is (S, — Xu), so that 


Sn — X 
Es uw) — =. 
(Xi) EE 
In this expression S, = nM, so that 
n x 
w-u.*w) — — M, — ——, 
Ew u(%w) E E AA: 
a 
F Lu Ly. u(%w) = — Ma ta t 


n E(x 
sydd E R a ME (x) — oe 


Whence from (xix) 


n 1 
E(x, . Kio) = Mi e A A . . e . (xx) 


Thus the mean value of the cross products in (xiv) is independent of the order of choice. 
In (xiv) the number of terms is a?, of which a have the form x; and a(a — 1) have the form 
Xy « Xw, the former being the diagonal terms when we lay out the operation of squaring grid- 
wise, €.g. : ‘ 
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x Xa Xa Xa 


Thus we may write (xiv) in the form 
Hoa = a. E(x?) + a(a — 1) E(x, . Xw) 
a(a — 1) 


— 1) ‘ 
o ree 


2 
n 1 Mee n 1 Mi, 

> A, na(a D) y2 02M? 
n— 1 

E p= =o qe el 


a Ea n— 1 
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(xxi) 


At every step in the foregoing argument, we have shown that E(x,), E(x), E(Xu, Xw) do not 


depend on the order of choice, hence the form of M, and V, is analogous, i.e. 


b(n — b b(n — 
Mi==b,M, and Vy= LE -= 


The value of Cov (Sa, S») is deducible from (xx) since 


Cov (Sa, Sy) = ElSa, $9) — MM, = E(Sa, 55) — ab Mi; 
A Se A A aio + OE ss 


In the expansion of the last product there are ab terms, 1.e. 
ES, 35) == HAS, Xu) 


abn ab 


n— I 
a Cov 5 So) n ee ae asgi pa H 


-, Cov lto So) = — PRE e 


(xxii) 


. (xxiii) 


as) 
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From (xxii) above we see that 
= b(n — b) 
a(n— a) ” 
V ab 
—a— a(n — b) 
If we denote the sampling fractions by f, and f, as above (i.e. a = nf, and b = nfs), 
o ; ; : ; . (xxvi) 
We may arrive at (xxiii) by an alternative route. If ais fixed we have 


Mo. = Ey. (Se) = Bo. a(%a+1 + Xare eee Xp) =b. Es. A 


b 


tay = (xxv) 


In this expression E, . ¿(x,) is the mean value of a b-fold score-sum from an (n — a)-fold universe 
of which the score-sum is (Sp — Sa), so that 


b b 
n= @ i —= E VEASE 
b b 
E A o 
i dá n— a 
b E 
E Moem M e - ; ; h . (xxvii) 


There is therefore linear regression of s, on sa, and (by the same reasoning) of s, on s+ and 
the regression coefficient in (xxvii) is 


In virtue of linear regression, we may write 
Covisa $y) = Roa Vag 
b 


Mi E 


"n A eS = as 


The reader will note that (viii) and (xxvi) are identical expressions. Also the appropriate 
expression for V, or V, reduces to that of the hypergeometric distribution for the 2-class case 
since we may write the variance of the unit sample distribution in the form 


Vy = pa — Mi. 


For the 2-class universe V, = pq and (xxi) reduces to 


a(n — a) 
V, = ———P1- 
E Pq 
Numerical Example.—From a pack of six cards consisting of the ace, 2, 3, 4, 5 and 6 of clubs, each 
player takes two cards without replacement, recording as his score the total number of pips. Since A 
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draws twice, he may select any one of êC, = 15 combinations, and B may choose any one of 4C, = 6 
residual combinations which we set out in the following schema : 


A’s choice Possible choice of B 
12 Of OB. To GG 4 DO 
13 2A 235 © AS 40 36 
14 2 2 -S O o O 
15 234 A- w u 3o 4 
16 e O o a So 
23 14 15 16 45 46 5 
24 13 15 ie = de oo S 
25 13 14 6 3 3% 46 
26 13 14 bo H A 
34 12 15 Se S 
35 12 14 O AA- So 40 
36 12 14 E A D 
45 12 13 ee A 
46 12 13 SS SS 
56 12 13 te 29 24 a 


The corresponding scores of the samples set out in the foregoing schema are as follows : 


oe a o Possible B-scores 
12 3 C a 10. n 
13 4 | se oy eS 
14 5 ey ee eee A ee 
15 6 == 7, +: 
16 7 ee 7 G 
23 5 e O 1 
= 6 fo a 4 1 
20 7 E e glee S 
5 > T r o p 
2 A e 
5 À ee eee. 
36 9 eek A 
7 7 se ee e n 
46 10 3 4 6 5 7 8 
ne o E BF 


Each of the 2-fold samples either A or B can draw admits of two permutations. So the number 
of permutations corresponding to particular scores in this lay-out are in the same ratio as the number 
of combinations. Consequently, the required frequencies of particular A-scores associated with par- 
ticular B-scores are as exhibited above : and we may summarise the result as a correlation table in which 
the drift of figures is downwards from the top right-hand corner to the left lower corner : 
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A 
3 4 5 6 7 8 9 10 11 Total Mean Variance 
3 a ee ee ee 1 2 1 1 6 90 | 4 
4 a a s 1 1 1 1 1 1 6 8:5 35 
5 pie ie a a ee a 12 8-0 4 
6 1 1 2 2 2 2 1 1 12 7:5 ta 
7 1 1 2 2 6 2 2 1 1 18 7:0 79 
z 8 i 1 2 2 2 2 1 1 5 jb 6:5 Fi 
9 2 1 Z 2 2 1 2 ie me 12 6-0 4 
10 1 1 1 1 1 1 a ES “a 6 5:5 33 
11 1 1 2 1 1 me Me = ai | 6 5:0 72 
Total 6 6 12 12 18 12 12 6 6 90 7 14 
Mean 9:0 8S:5 8:0 75 7:0 6:5 6:0 55 5:0 7 -— | — 
Variance| 42 2 4 4 B # 4 H 0» T = | = 
From the data contained in the correlation table we obtain 
Cov (%q, xp) = — $; V =?*% = V,; 
. Tay = — E. 


For the inter-class and intra-class variances we have 


3 
V(M,y) + M(Vas) = + = Va; 
V(M5a) + M(V ra) = 1# = Vo; 
e (Mira) Vo, = V(Mav)| Va = 1 = T3 
The sampling fractions are f, = 4 = fy, 
“+ (fa -FAA — fa — fo) = t= re 
Partition of Variance 


Some current text-books suggest to the readers that the square of the product-moment co- 
efficient is a just measure of explained variation. We have seen that this is true of correlation in 
the consequential domain of the Umpire Bonus set-up, and regression is always then linear. 
In the concurrent domain the square of the product moment coefficient is not a true measure of 
explained variation, though regression may in fact be linear. Regression is linear in both dimen- 
sions of the non-replacement model ; but no meaningful partition of variance is possible. ‘There 
is a negative correlation between the score of player B and the antecedent score of player A; 
but the variance of the B-score distribution is exactly the same as it would be if A had not drawn 
asample. Evidently, therefore, the square of the product-moment coefficient is not a measure of 
how much the circumstance of A’s antecedent choice contributes to the total variance of the dis- 
tribution of the B-score. 
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EXERCISE 12.02 


Set up the correlation table for the long run result and evaluate M,, Me, V(M, . a), M(Vo . a), V(M,. b) 
M(V.. 0), Roas Rap and Tas for the following situations. In Examples 4-8 each player adds the umpire’s 
score to his independent score. 


1. A pack consists of 10 cards numbered 1 to 10. A takes 3 without replacement and retains them. 
B takes 4 simultaneously, the players recording their total scores by adding the denominations of the 
cards. 


2. Another pack containing hearts only consists of 6 aces, 3 twos, 4 threes and 2 fives. Player A 
takes 4 without replacement, retaining them. Player B takes 3. 


8. An urn contains 15 balls of which 5 are red and 10 are black. The players draw without 
replacement, A taking 2, B taking 5. They count as their scores the total number of red balls in the 
sample. 


4. A pack consists of 4 cards numbered 1 to 4. The umpire takes 2 without replacement. After 
noting his score he returns the cards to the pack. The cards are then shuffled, the top two are given 
to A and the other two to B. 


5. The situation develops as in 4. After return of the umpire’s cards to the pack, A takes one 
card and retains it, then B draws two, without replacement. 


6. A pack consists of six cards numbered 1 to 6. The umpire draws two without replacement 
and retainsthem. A then takes two from the remainder and B gets the two that are left. Does reversing 
the order of taking the cards make any difference ? 


7. A pack consists of 9 cards, numbered 1 to 9. The umpire takes 2, without replacement and 
retains them. A draws 2, without replacement, from the remaining 7, after noting down his score he 
returns them to the pack. B now takes 2 without replacement. 


8. Four black and 2 white balls are placed in an urn. The umpire draws out 2 simultaneously, 
scoring the number of black balls. He replaces them in the urn. A now draws 2 simultaneously 
and retains them. B draws 3 simultaneously from the remainder. 


1203 THe Two-pack MODEL 


We shall now examine a model situation, which involves correlation with linear regression in 
both dimensions, as in the foregoing section. As is true of the non-replacement model, the 
causal nexus is not a concurrent relationship like that of the two players of the Umpire Bonus 
set-up. Nor is the type of reciprocal constraint of a sort which we can rightly describe as 
consequential in the sense that the score of the player in 12.01 is consequential to that of the 
umpire. 

We suppose that two players extract samples with replacement, one from a full card pack 
and one from an otherwise identical pack containing no hearts. Player A takes a cards from the 
full pack and records his heart score (x,). Player B takes (a — x,) cards from the pack which 
contains no hearts and records as his score (x,) the number of diamonds in the sample. We may 
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denote the numbers of cards in the n-fold pack from which 4 draws as follows : 
Hearts Diamonds Others 
MPa MP > npe 
Evidently (npa + npo + np.) = n and (pa + Po + p) = 1. 
THE TWO-PACK MODEL. 


Diamond Score of B for a 

(3-h) fold draw with replacement 
from pack with hearts discard 
-ed. 


B SCORE 


O | 2 3 
OQO J Oti Ox2 | Ox3 
IxO Ixl Ix 2 
2:07 24 
sawa jaana] o |o 
3:0 
pw o e a 


A SCORE 


Heart Score (h) of % 
Piayer A for a 3-fold Y 
draw with replacement 

from full pack. le) 


Fic. 92. The Two-Pack Model. Player A takes 3 cards from one full pack replacing each before drawing another, 
and recording as his score the number of hearts in the sample. Player B takes a sample from an otherwise full pack 
after discarding all hearts, the size of sample being 3, 2, 1 or 0 according as A scores h = 0, 1, 2 or 3. The score of 
player B is the number of diamonds in the (3-h) fold sample. 
In B’s pack there are (n — np,) cards of which np, are diamonds. Thus the proportion of 
diamonds in B’s pack is p, + (1 — pa), whence 


_ (a — xa)ps oes: ( ny tee ce 
es yea and. Vania Ae Ma) LR = ES . . (1) 
Evidently, Ma = apa, and V, = ap,(1 — Pa). Since E,¿(M,..) = My, 


-Po _PoEdlxa) ap Ma - Do 
eas Pape it, 1 — pa 


M, 


+ Ma —[My = aa — MA) o 
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Thus regression of x, on x, is linear, the regression coefficient being 
EP 
Roa FER 1 =e Pa d 
E O O Og Ps 
Since regression is linear we may write 
=D. D 
ME Jam. o 
The variance of B’s score for a fixed value of x, is 
= a — ep. — Pa = By) 
RAT crea 
(1 es: Pa) 
A =P. PH Pa — Po) ORs 
ove M V ee 
es Gp (=; 
im ap —p,— Po) 


(1 z Pa) 
We thus derive 


2 — — 
Vy = VW.) + MV.) = 22. 4 PP = Po) 


l—p, 1 — pa 
. Va = ap,(1 — po) 


Hence we have 


¿E g Papo 
(1 = PP) 


593 


(iii) 
(iv) 


(v) 


(vi) 


(vii) 


Example. —A draws 3 cards from a full pack, scoring hearts as success. B draws (3 — h) cards 


from a 39-card pack containing 13 diamonds and scores diamonds as success. 


0 1 2 3 Total Mean Variance 


Total 
Mean 


Variance 


MV, = a = 3 zF MV Gs) > VM, : a m E = V(M, . b) ; 
V(M, : a)lVo E ʻ Te VM, e o)/ Va 
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EXERCISE 12.03 


Investigate the following model situations and construct a grid like that of Fig. 92 for each. 


1. Player A takes 2 cards with replacement from a full pack recording as his score the total number 
(s) of spades. B takes (2 — s) cards with replacement from a 39-card pack containing no spades, but 
otherwise complete, and scores diamonds as successes. 


2. Player A takes with replacement 3 balls from an urn containing 5 red ones, 7 black and 13 white, 
recording as his score the number (r) of red balls in the sample. Player B takes (3-7) balls with re- 
placement from an urn containing only 5 red and 7 black, again scoring red balls. 


3. What would be the result if the players of Example 2 did not replace ? 
THE RECTANGULAR UNIT SAMPLE MODEL 


Size of Pack(x,) | 2 3 4 


J pE 
3 AO ROBO. f 
On On an DR DF (My) (Vy) 


Expected Score (x,) 
of Unit Sample 


E 
E 
g 
E 


Variance (V,a) 


Fic. 93. Rectangular Unit Sample Model. The player takes one from each of 6 packs respectively containing 1, 2, 
3 . . . 6 cards, all clubs numbered consecutively from 1. The marginal entries respectively show size of pack and 
the player’s expectation. 
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1204 THE UNIT SAMPLE RECTANGULAR MODEL 


The two preceding models illustrated situations in which correlation goes with linear regression 
in both dimensions, though we cannot properly regard the law of association as linear in any other 
sense. We shall now examine a situation in which correlation goes with linear regression in 
only one dimension. The rule of the game is as follows : the player chooses one card from each 
of a set of packs containing different numbers of cards consecutively numbered 1, 2,3. . . etc., 
and records as his score the number on the card chosen. In each numerical illustration cited 
below, the player takes one card from a set of card packs containing 1, 2, 3, up to 6 cards of the 
club suit starting with the ace. Thus the smallest pack consists of an ace only, the 2-fold pack 
of the ace and 2, the largest pack containing the ace, 2, 3, 4, 5 and 6 of clubs. The score x, 
is the number of cards per pack, the player’s score being x, clubs in the unit trial from any par- 
ticular pack. The maximum score (N) of the player is therefore 6. 


Example 1.—There are 6 packs in all, so that the player takes 1 card from each pack in a single 
set of trials. Thus the A-score distribution is rectangular like that of x, within a column. 


No. of cards per pack (xa) 


Player’s 3 
score 4 
(xa) 5 

6 
Total 
Mean 


Variance 


Example 2.—A set of unit trials involves choice of 1 card from each of 243 packs, the A-score 
distribution being defined by successive terms of the binomial ($ + 4)’, i.e. the set-up is 


No. of cards per pack a | a 3 4 s: 
No. of packs . E ES ON BO AD ea 
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No. of cards per pack (xa) 


1 2 3 4 5 6 Total 

1 192 240 160 60 12 1 665 | 

2 es 200 160 60 12 1 E ae | 
Player’s 3 tes a 160 60 12 1 233 
score 4 S a => 60 12 1 72 
(x,) 5 nan ee E = 12 1 13 
6 ss e a mee vs 1 1 

Total 192 480 480 240 60 6 1458 

Mean 1 1-5 2-0 2:5 3-0 3:5 14 | 


V Mo.) 6 _ 2 as: 3q 
V, A 
* * * * * * * 


Models of this class are not without relevance to practical affairs, since they illustrate the genesis 
of a coefficient of correlation possibly serviceable as a summarising index of the influence of birth 
order on the incidence of rare congenital conditions if they occur singly in a sibship. If parity 
has no effect on the long run expectation w.r.t. birth rank of affected individuals in a sibship s 
members will be equal for rank 1,2 . . . s. A grid lay-out with family size as one set of border 
scores and birth rank of affected individuals as the alternate set will then exhibit a triangular 
contour ; and the birth rank mean will increase by equal steps w.r.t. equally spaced family size. 
On the same assumption there will be a correlation between the two sets of scores; and the 
product-moment coefficient must be less than unity. To the extent that affected individuals crop 
up mostly among first births or towards the end of the family, the observed value of the coefficient 
will approach zero or unity respectively. ‘Thus the discrepancy between the value of the product- 
moment coefficient (7,) computed from observed data and its value (7,,) computed in accordance 
with the null hypothesis of equal expectation is indicative of the extent to which birth rank 
influences the occurrence. If 7, is product-moment coefficient for the observed bivariate distri- 
bution of affected individuals to each of which we assign one score (x,) in virtue of family size 
and one score (x,) in virtue of birth rank, and 7,, is the corresponding coefficient computed on 
the assumption that the expected birth rank of the individual has a rectangular distribution 
as in models of this class we may define an index (B,) which has the value zero when there is 
no birth rank effect with limits of + 1 and — 1 respectively, signifying that all effects are last- 
born or first-born, viz. : 


(r, Ep Pan 27, TAE a T 1) 
raki ce Lan) 

For this class of models, we have already defined the score x, as the number of cards in one 

of a set of packs from each of which the player selects a single card. He records the number of 


pips as his own score x, referable to a particular pack the cards of which have 1, 2,3 . . . pips 
up to s, the size of the pack. No more than one card of a particular denomination is present in 


B, = 
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the pack, and the denominations are consecutive, hence the unit sample distribution with re- 
spect to any pack is rectangular. In the biological illustration x, and x, respectively correspond 
to family size and birth rank. The mean value of x, associated with x, is evidently 4(x, + 1), 
so that 

M, EE F3E¿(%a af 1) Er ¿(Mo ES H 


a ME a E MO 


Thus regression is necessarily linear in the B-dimension of the grid, irrespective of the distribu- 
tion of card pack size and kpa = $. In virtue of linear regression 


B V, = V(My..); 
VO, Jak, 


From the elementary property of the rectangular distribution, 


x? — 1 
i. 
E (x6 V, + Mi 
+. MV.) = 00 — tg = 2 — te, 
Y M? = Í 
M(V 4. a) 19 4 12 (1) 
y aed i ee | 
ee E | 12 
¡E Mi — 1 = 
=: 3 i ae : : A E A h (11) 
V(M,.«) jo ae Fa $ 
A ee y ei : : = (iii) 


The evaluation of ra, now depends solely on the distribution of card pack—in our biological 
illustration family size (x,). Two cases admit of easy solution : 


(a) rectangular distribution of family size from 1 to N as in our first numerical example : 


o... IA 
Moa eg A 
3(N + 1) 
Oe 
i, Se ING ' i A i ; s) 


(b) binomial distribution of family size as in our second numerical example. We assume 
that consecutive values of x, from 1 to N occur with frequencies defined by successive 
terms of the binomial (q + p)” ~+, so that 


M,=(N—1)p+1 and Vi, = (N — 1pg, 
_(N—1p 
12 

3q 
AAA E ee EE 
Y : i i ; : 0) 


Ms [6 + (N — 9)p], 
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The variance of the player’s score in the unit sample rectangular model set-up admits of 
a unique partition in terms of the contribution attributable to the circumstance that the card 
packs are of unequal size in a set of trials only if we agree on a quite arbitrary prescription of how 
we propose to eliminate the source of variation. ‘The problem admits of a simple solution if our 
prescription is to replace each individual card pack of a set of unit trials by one and the same pack 
of Ma, cards, as is possible only if M, is an integer. For the residual (unexplained) variance V,, 
when we eliminate in this way the source of variation arising from the fact that the card packs 
are not all of the same size we may then write 


M2 —1 


oe 12 


Hence from (ii) above, since V(M,.,) = 1V,: 


A ee 
3 3 
It is however arguable that we might pool the N packs of variable size in each N-fold set of trials 
and record the result of taking N unit samples with replacement from the composite pack. Such 
a procedure admits of no singular solution unless we specify the distribution of card pack size, 
and hence of the unit sample distribution of the composite pack. 


EXERCISE 12.04 


Examine the joint distribution of the player’s score at a single trial and the number of cards in the pack 
each consisting of cards consecutively numbered from 3 upwards when the distribution of the packs 
are as follows : 


Size of Pack 1 = 3 4 5 


No. of Packs 
(a) 1 1 1 1 1 
(b) 1 2 3 
(c) 1 LAA 1 


12.05 Lexis MODELS 


The rule of the following model, so named for reasons mentioned in Chapter 9 of Vol. I, is that 
the player draws the same number (s) of balls from each of u urns, each containing n balls of which 
a certain number (x,) are red, the player's score (x,) at a given trial being the number of red 
balls in the s-fold sample. Within this framework, we may distinguish two solutions according 
as the player does or does not replace before drawing another ball. When there is replacement 
the sample distribution is given by the terms of the binomial 


(n — Xa + Xq) +n’. 
When there is no replacement the definitive binomial is, of course, 


(n — xa + x.) + n. 
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3-fold Draw from 
A STRATIFIED (LEXIAN) UNIVERSE 


000 090 000 000 Mab May MÍ Vab Eabha Mo) 


O 
64. E ee a a 3 E e A ra 
125 125 125 e E ee 25 
EL 4 36 E: £ G- a 18 
125 125 125 Ú 3 25 25 
SL 4 36 os 6 O 18. 18. 
125 125 125 o 3 25 25 
SE -36 A FA 2 des - & £/. 
125 125 125 125 a 25 
Weighted Totas: L£ 2 3 OH 
Ma  VIMap) Mpg) Va” 
[ViMpp)* MIVo o) 


Fic. 94. Correlation in a Lexian Universe. The player draws with replacement 3 balls from each of 4 urns, 
respectively containing red : black in the ratios 4:1, 3 : 2, 3 :2 and 2 : 3. The player’s score is the number of black 
balls in the sample. ‘The urn score is the number of black in every 5 balls in the urn. 


Example 1.—Six urns each contain 5 balls of which 0, 1,2 . . . 5 are respectively red. The 
player takes 3 balls with replacement from each urn. 


per urn (Xa) 


Red balls 0 1 2 3 4 5 Total Mii Fe 
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3-fold Draw from 
THE PARENT HOMOGENEOUS (BERNOULLIAN) UNIVERSE 


Gee 000 000 eee Map (May-M Vep Epa MO” 


A o A 


125 125 125 125 5 eS 25 
z E E nl LI AOS 
125 125 125 125 3 5 25 
z > Dl E O 
125 125 125 [25 5 25 2S 
2z a x a PO 
125 125 125 125 5 25 25 


Weighted Totals: G O 2 = 
Mg 


V(Mgp) M(Viq) Vg = 
lv (Mba E MIVoo)! 


Fic. 95. Homogeneous sampling from 4 urns each equivalent to the pooled contents of the urns in Fig. 94, 


ee ee ee pera oe 
Mi = 23 hye = 53 5 = =: 


O O ee 
~~ (750)(450) ° tas (375)(225) ’ Pam ai 


VM, a) = 503 ME = 
Cov (Xas Xp) =4 ack Regs Fa: 


s(u + 1) 


VM, . b) 


PM) 


$. 3 
V, oe Dei 2D) 
Example 2.—Six urns each contain 5 balls of which 0, 1, 2 . . . 5 are respectively red. The 


player may choose without replacement 3 balls from each urn. 
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per urn (Xa) 


Red balls 


9 8 3 
y o )=3 
V(M, . a) = 20; MV.) = 55 VM.) + MV. a) = i = Vo; 
Cov (Xa, Xp) ices i= Roa. Ves a = 25 = Y = a 2 


Example 3.—The player chooses as in Example 1 3 balls from the same 6 different sorts of urns 
each containing 5 balls, replacing each ball drawn before taking another. There are however 32 urns 
instead of 6, the x, score distribution being binomial in accordance with the terms of (4 + $5), i.e. the 
urns are as follows : 


No. of red balls per urn . Y a 2 E E 
No. of urns. ; : ae. | 5 10 10 5 
per urn (Xa) 
Red balls 0 1 2 3 4 5 Total Ma.» re 
| 
0 25 64 54 16 1 0 160 L ies 
per 1 0 as | 108 | 72 12 0 240 11 ES, 
sample II | _— == >= — 
(x,) 2 0 12 72 108 48 0 240 14 ee, 
3 0 1 16 54 64 25 160 43 150 
Total 25 125 250 250 125 25 800 — — 
V(M,.+) = $00; MV...) = F25 ; V(Ma . o) + M(V,. ») = a= V,; 
VM, . a) = 25 ; M(V,.«) = 3; VM, . a) + M(V, . a) = 20 = Vo; 
A sV, 
V, 27 (s—DV, + Mn — Ma) 
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In all the foregoing, we suppose that there are u urns containing an equal number (n) of 
balls some red, some black, the proportion of red balls in the ath urn being p, and the actual 
number (np,) being the A (column border) score (x,) of the grid. The player draws a sample 
of s balls from each urn (with or without replacement) and records as the B (row border) score 
(x,) the number of red balls in each sample. The symbol M, stands for the mean number of 
balls per urn. With or without replacement, the mean score of an s-fold sample from the ath 
urn is spa, so that 


Pe M,. = Pa=-. Xa > M, = -E(x,) = -M, ; 
n n 
S E 
Mo. — M, = (xa — Ma) ; ; ; ; eo 
Rig = = 
n 


Thus regression of the player’s score on the A-score is necessarily linear. It is then a property 
of the score-frequency grid, as shown in 11.04 that 


V(M,. a) = Vo — M(Vo. a) = kia Va 
e E 
is V(M, È a) = Ya . . . e . : . . (11) 


It will suffice to develop 72, in accordance with the condition that there is no replacement, since 
the alternative case is deducible therefrom as a limiting condition. If the player draws s balls 
simultaneously from an urn containing x, red balls and ba = (n — x,) black ones, the distribution 
function for the x, score is (x, + b,) + n® ; and its variance w.r.t. such an urn is given by 


= s) s(n — s) = 
V, E ES n(n ae ral” Xa). 


y Pall pos Pa) = = 


Now M(V, .a) = E.(V,. a) and 
E,[xq(n — xa)] = nM, — E,(x7) 

nM, — (Va + Mi 

= M,(n — Ma — V 


CME = A ¡PL Wi 


Since V, = M(V,.«) + V(Mo. a), we obtain from (ii) above 


V, = 3 Biden 2 M,(n egi Ma) — as ù or age F g E r, 


n(n — 1) n(n — 1) 
eo ees ae s(s — 1) en 
V,= E an — ys Ma) + a a a re ; . (mi) 


Since Cov (Xa, xp) = Rra a when regression of x, on xa is linear, 
2 172 2 
y? pS Ra Y a La S Va 
b reag a . 
E ORE Paces E A 
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Whence from (111) above 
ee s(n — 1)V, 

2" nis — 1)V, + (n — 5)M,(n — M,) 
When n = s in (iv), (n — s) = 0 and s(n — 1) = n(s — 1), so that 77, = 1. This is necessarily 
so, since the player then empties each urn. If n is very large in comparison with s, we may write 
(n — 1) œn œ (n — s), so that (iv) reduces to 


(iv) 


5 EF D 
Ta = rR : ; . A 
oie D Me M) 

_ From the algebraic viewpoint, it is usually immaterial whether we impose the restriction 
of replacement regardless of the size of the universe (7) or consider that n is indefinitely large com- 
pared with s. If so, (v) is definitive of the replacement condition; but a peculiarity of this 
set-up is that either (iv) or (v) reduce to zero as n approaches infinity in virtue of the factor 
(n — M,) inthe denominator. For that reason, it is appropriate to develop (v) independently 
from the variance formula of the non-replacement distribution when n is small 


$ 
Via =D =P.) = 9 — 2) = 7 — 


s s s S 
ete MV». a = Hala) ee tia Xa) id Ma aig ¿Va F M3), 


> ve, s Me 

cs dr EM — GV + á) 
s(s — 1) S , 
E Vet a(n —M,) . i ; j ¢ =i) 


In virtue of linear regression we may write as before 
oe PE; 
e ES 
Ta e x 
” (s—1)V, + Ma(n — M,) 

Both (v) and (vi) uniquely depend on the distribution of the A-score, i.e. that of the urn com- 
position. If the u urns are all different and the A-scores run consecutively by unit steps as in 
Examples (1) and (2) the distribution is rectangular, and 
u? — 1 

12 
If the minimum number of red balls in an urn is m, the mean value (M,) of x, is m + 4(u — 1); 
and if m = 0, (v) becomes 


V, = 


oe s(u + 1) 
ws = Aa) Sn — uE 1) 


We may now enquire in what sense we can partition V, the variance of the player’s score 
distribution into two components Vy and Vy respectively definitive of explained and unexplained 
variation, 1.e. of variation arising from and independent of the source, viz. the circumstance that 
the urns do not all contain the same number of red balls. We can give the issue so defined a 
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unique meaning only if we regard the individual urns as sub-samples of a single universe which 
we can reconstruct on a Bernoullian basis, i.e. as a homogeneous system, by mixing the contents 
of the u urns. How much (Vz) of the variance of the B-score distribution then arises from the 
_ single relevant circumstance that the universe is stratified in virtue of the heterogeneity of the 
individual urns, we may then assign by deducting what the variance (Vy) of the players’ score 
would be if allowed to take s-fold samples from the reconstituted (Bernoullian) universe of un 
balls. Though there is nothing arbitrary about such a definition of explained variation, it is 
important at the outset to recognise that it admits a unique solution only in virtue of an arbitrary 
property of the model, viz. that each sub-universe contains the same number (n) of items. With 
due regard to this limitation we proceed as follows. 
If f, is the number of urns containing x, red balls in the stratified universe, na is the total 
number of red balls and M, the mean number per urn: 


a= a 
u— > Sere ie = 


a0 i= 


0.0) 

Na 
ton, E 
0 


u 


The proportion of red balls in the universe as a whole is therefore 


Na M, 


u 
un n 


If we define the unexplained component of variation (Vy) as above we may therefore write 


with replacement 


Ma Ma s = 
VY, ==> o ie) == a(n — Ma) . i . (vi) 

without replacement 
Pa ices q See M.(n—M,) . 


un — 1% = nun — 1) 
In either case by definition 


Vg z V, EA Vo. 


Whence from (vii) and (vi) sampling with replacement implies 


s(s — 
y? 


Vx E 5 e a 


Hence from (11) 


s— 1 1 E 
Va = + . VM,. 2 and Vo = MV, : a — VM. a . . (1x) 


When there is no replacement, we obtain from (viii) and (iii) 


a Srey ce s(un — S) gis s(s — 1) 
ee E —1) nun — H Lo n(n — ie 


ee de s(s — 1)(u — 1) 
s(n — y Mo. 0) n(n — 1)(un — 1) 


The last expression reduces to (ix) when 1 is indefinitely large. 

In seeking for a partition of variance referable to the source, viz. stratification of the uni- 
verse, we have here explored the result of removing the source of variation by deducting the 
variance of the player’s score in sampling from a parent (Fig. 95) Bernoullian universe, i.e. 


M¿n— Ma) - > -. 39) 


A AA 
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a universe constructed by pooling the contents of the urns and extracting s-fold samples from the 
composite urn so constituted. Provided x, is an exact multiple of u, we may of course replace 
the composite urns by u urns each containing n balls of which the proportion of red balls is p,,. 
When this is so the variance of the player’s mean score, i.e. V(M,.a), is necessarily zero and 
V, = M(V,.,). If the condition of replacement holds good V,., does not depend on the total 
number of balls in the urn and it is immaterial whether we define in one or other way stated above 
what condition we impose when we make the sampling process homogeneous. This is not so 
when there is no replacement. Equations (vii) and (viii) do not then define the alternative stated 
above, vz. sampling from a universe of u urns each containing n balls and each having the same 
proportionate composition as the composite urn containing un balls. ‘To that extent we may 
regard the criterion of explanation as ambiguous for the case of non-replacement.* 

With one notable exception, it is a common property of the class of models under discussion 
that regression is always linear in one dimension and in one dimension only. A special case, 
illustrated by Example 1 above, arises when: (i) we sample without replacement from a strati- 
fied universe of which each of (n + 1) sub-universes contains the same number of items ; (ii) the 
unit trial expectation from one or other sub-universe is 0, 1,2 ...m. If the conditions last 
stated hold good, our numerical example illustrates two properties peculiar to the situation : 


(a) the distribution of the row-means like that of the column border-scores is rectangular : 


(b) regression of the row-scores on the column-scores is linear as is generally true of re- 
gression of column-scores on row-scores for other models here dealt with. 


Both peculiarities of this set-up are derivable by recourse to properties of figurate numbers 
dealt with in 11.07. We first recall that 


z=(n—s) 
2 Ferro Pesar Tea = (A+ Der 
z=0 


c=n f an 1 (s + 1) 
ove = Ci x(n D TTA AR a (xi) 


If the number of columns in the grid is (n + 1) and the A-scores run from 0 to n consecutively, 
the last expression defines the total frequency entries in a single row of the non-replacement 
Lexian grid in terms of the number of urns (n + 1) and fixed sample size (s). Since this does 
not depend on the row-score (r) or the column-score (c), (xi) therefore establishes the conclusion 
that the distribution of row-means is rectangular. The conclusion that the row-means increase 


as an arithmetic progression follows from the figurate form of the general expression for the 


row-means, vz. : 
C= 


> Ceti = Oli 


Me. r = 5 : ; ; ij 
l (n + Icey aD 


* The partition of variance defined by (viii-ix) which is equivalent to the familiar formula for the difference 
between the variance of the distribution of sampling in the Lexian and Bernoullian universes derives its rationale 
from postulates totally different from those which justify the identification of explained variation with the so-called 
best estimate of the variance of the column means in a one-way classification of scores of a grid such as one which 
summarises a plot yield experiment. A Fisher score-grid of this type specifies a single sample from a universe of 
scores and any parameters based thereon are therefore estimates. The Lexian score-frequency grid with which we 
here deal is a complete distribution summarising the results of extracting an indefinitely large number of all possible 
samples, and the appearance of the factor (s — 1) — s on the right hand side of (x) kas no connexion with the problem 
of estimation. ‘The equation itself is an exact description of how much variance arises from the circumstance of 
stratification in the entire universe of choice. 


4 
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From (xv) of 11.07 we know that 


c=n 


= €. Cy ln — ehs- = (r + Din + 2) +2) — (n + Des +1 
c=0 
By substitution in (xii), we obtain 
M oe aks Dit 2) i _ (n+ 2) (n — s) 
ieee (s + 2) (s+2) (s+ 2/ 


M,(n+ 2)  (n—s) 


+, We, == BAe.) = (s+ 2) ee 
_ (n+2) 
(Me i= TEG 


Thus there is linear regression of the column-score on the row-score, the regression coeffi- 
cient being 


n+ 2 
q 


Bos nd 


SAMPLE SIZE MODEL 


SAMPLE SCORE X.) 


O | 2 3 4 5 
SAMPLE SEO) <> a eae © -- C0 - -000 -0000 90000 TOTAL Mos 
O i 
OO 2 
000 3 
0000 4 
00000 5 


MV) =R WM) =p me2s 


3 


Fic. 96. Correlation between size of sample and score (number of red balls chosen with replacement) from same urn. 
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EXERCISE. 12.05 


1. Repeat the computations shown above for Examples 1 and 2 when the number of urns is 
7 and the number of balls is 0, 1,2 . . . 6. 


2. Repeat the computations for Example 3 when each of 64 urns contains 6 balls as follows : 


No. of red balls per urn: 0 1 2 3 4 5 6 
No. of urns : 1 6 15 20 15 6 1 


1206 SAMPLE SIZE AS A SOURCE OF VARIATION 


We shall now consider a set-up in which one score (x,) is the size of the sample the player takes 
from one and the same urn of n balls, recording as his score (x,) the number of red ones in the 
sample. We denote the fixed proportion of red balls in the urn by p, so that there are pn red 
ones in the urn and qn = (1 — p)n other balls. This prescription admits two variants in virtue 
of the possibility of imposing the replacement condition or otherwise. 


Example 1.—The player draws from an urn containing 12 balls of which 8 are red, and separately 
records the result of taking samples of 1, 2, 3, 4 or 5 balls without replacement. 


Sample size (xq) 


Player’s score 


(xp) 
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Example (2).—The player replaces each ball before drawing another but the set-up is otherwise 
as for Example (1), so that kpa = 3 


Sample size 


1 2 3 4 5 Total Ma ad Va . Ò 
PE A 
0 81 27 9 3 1 121 179 —— 
1 162 108 54 24 10 358 i Ter 
2 0 108 108 72 40 328 ce re: 
Player’s score a a e SE a anna oeenesii = ene 
(xy) 3 0 0 72 96 80 248 125 = 
4 0 0 0 48 80 128 vs 15 
5 0 0 0 0 32 32 5 0 
Total 243 243 243 243 243 1215 3 2 
M b.a 3 $ 3 $ =$ Z o — 
Vo.a 3 $ 9 g > al = = 


M(V a. ») = 0:767 ; M(V,.a)= 5; V(M,. +) = 1'233 ; V(My . a) = $ = Ka Va; 
MY wr VM.) = 2 = ae MF aca) F A 7 = Vo; 


V(M, . ») LA ELLIS PO 
> = 06165; pe = $= tay 
* * * *k * * e 


If x, is, as defined above, the score of the player 
My .a = X%a-P3 M, =p- E) =p- M; 
Mio- My =pl— Mo). 


Thus regression is linear w.r.t. the player's score on the A-score, and 


Roa =P; 
gt Per bee sm ; (i) 
If there is no replacement 
oe x(n aes Xa) T pq -a 
Vo.a= oo) eo | j ("a Es e 
n 
MV,.) = EE Bylo) — ¿(Vo + M2) 


O sl ds A a! | > 
oe, | a(n — Ma) 4 y. s ; ; ; (ii) 
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From (i) and (11) we obtain 


Va = M(V»..) + V(Mo..) = 4 Mn — Mo) — “LV, + PV, ; 
Vo LL (n ~ M) E po E de eran 
We thus obtain 
y, = IO a A (iv) 
V, — ¿Mn—M)+(mp— DV, 
When there is replacement 
Vo .a = xapq; MV.) = Mapa; 
V, = Mopq + p?V, . : : E La 
pra A ee 


A a 
gM, +pV 4 


The last expression is of course the limiting form of (iv) when n is very large, so that (n — 1) 
œn œn — M, and (np — 1) = np. 

In this set-up the variance of the player's score distribution depends on two circumstances, 
viz. random sampling irrespective of sample size and variation w.r.t. sample size itself. There 
is a unique sense in which we can partition V, to exhibit how much of it is explicable in virtue of 
the distinctive peculiarity of the model, viz. sample size variation, if we conceive the procedure 
as an indefinitely repeated sequence of sets of trials involving the same distribution of sample 
size per set. So conceived, the number of items chosen in any one such set remains the same 
if for every trial of x, items we substitute M,, the mean sample size. This involves the restriction - 
that E,(x,) is an integer. If there is no replacement the residual (unexplained) variance is then 


pek Mz a E io 
Thence from (11) and (111) 
E A 3 
M(Vo.a) = Vy ae E 
np — IVa 
Vi = VW AE ee 
Since also V, = M(V,.,) + V(M,..), we thus derive 
q OY 
= =, V(M ; i ; i 
Vo MV, A 6) a pin T 1) V( bd. a) (viii) 
BE RE ; 
Va = E Pra 5 | o : wae tae) 
When 1 is indefinitely large we derive the corresponding expressions for replacement 
Ve MIE) and Fr FM 0) (x) 


Subject to the condition that there is replacement the variance of the player’s mean score 
distribution therefore tallies exactly with the component of explanation, as is also true of the 


520 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


consequential relation of the player’s score to that of the umpire in the models of 12.01, though 
not of the concurrent relation between the scores of the two players in the same set-up. But 
we arrive at this conclusion by considerations which are in no way comparable with those we 
invoke to interpret a one-way partition of variance of the players’ score distribution w.r.t. the 
contribution of the umpire in the umpire-bonus model. 

A property of the umpire-bonus model in the domain of consequence is that the grid is homo- 
scedastic in the U-dimension, i.e. the variances of the player’s score distributions are the same 
for all values of the bonus. The variance of the players’ score distribution for a fixed contribu- 
tion of the umpire is constant, its mean value being proportionate to the umpire’s score. ‘This 
is not true of the model of this section. 


EXERCISE 12.06 


Repeat the computations of Fig. 96 for an urn containing 12 balls of which 3 are red, the player taking 
samples of 1, 2, 3 or 4. 


12.07 THE ORTHOGONAL LOTTERY MODEL 


The model we shall now examine is a variant on the theme of 12.01, where we have made reference 
to it. It is equivalent to a bonus set-up involving two umpires, the players being passive, and 
therefore a limiting case which arises when the maximum individual score of either of the two 
players is indefinitely small in comparison with that of either umpire. It is, however, worthy 
of special consideration for several reasons. As we shall later see, it has a special bearing on 
the assumptions inherent in the method of factor analysis due to Hotelling. It discloses how 
it is possible to generate a bivariate distribution with zero covariance when the variates are 
not statistically independent. It also gives us a clue to the procedure (Chapter 16) known as 
orthogonal transformation, a device by which it is possible to establish the statistical independence 
of two continuous variates. 

The rule for the model of Fig. 97 is as follows. An umpire spins once each of two wheels 
with marginal scores (1-5) whose frequencies conform to the expansion of (4 + 3)*. One 
player (4) records as his score (x,) the sum (x, + x2) of the scores registered at each trial by one 
or other wheel. The other player (B) records as his score (x,) the difference (x, — xə). Fig. 97 
exhibits the joint distribution of the scores of the players. From the symmetry of the cell 
entries of the grid in both dimensions, we may infer that covariance is zero without recourse to 
computation ; but the reader may care to check this conclusion. 

We may regard the model of Fig. 97 as a particular variant on a more general pattern in- 
volving n wheels each of which the umpire spins one at a trial and 7 players, to. whom the umpire 
allocates some multiple of each wheel-score. We may then represent the system of scoring for 
2 players with 2 wheels by the following pattern : | 


e. = As, + Ax, and =D A : i 0 
The mean scores of the players are thus | 
E(x.) = A,E(x,) + E(x) and E(x,) = B E(x) + BoE(xo). 
If the two wheels are identical we may put E(x,) = M, = E(x,), so that 
| E(x.) = (4, + A,)M, and E(x,) = (B, + BM. 
Thus the players have an equal chance of winning if 


(AAA ¡AA 
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THE ORTHOGONAL LOTTERY 


SCORE (xX, or Xx) I 2 : 
‘ 146 
FREQUENCY : i 16 6 


ala > 


SCORE OF PLAYER A X,= xX, +X, JOINT DISTRIBUTION OF PLAYERS’ SCORES 
SCORE OF PLAYER B x =x, -X A Score (x1 
Relativ 23.456 7.8 9 VION 
Frequency a a as Xy * 0 
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Fic. 97. The Orthogonal Lottery Model. An umpire spins the wheel twice, one player recording the sum as his 
score, the other the difference. 


IO 


y 


For the covariance of their scores we have 
Cov (X%q) Xp) = Ela . x) — E(x El) 
= E(A,x, + 4x)(Bıxı + Bx) — (A, + AN(B, + B.)M? 
= A,B, E(x) — A,B,Mz + A,B.E(xz) — ABM; 
+ (A,B, + A,B,)E(x,, x2) — (A,B, + A,B,)M2. 
If we write V, for the variance of the single-trial wheel-score distribution, we therefore have 
Cov (Xa, x) = (A,B, + AB) V x + (4B + AB) Cov (x1, xa). 
Since the scores x, and x, are independent Cov (x,, x.) = 0, and 
Cov (x,, x,) = (A,B, + ABAV 7. 

The condition of zero covariance is therefore that 


peek erie ai) 
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For the model of Fig. 97 A, = A, = 1 = B, and B, = — 1. Hence this condition holds 
good, as we have seen, and it is not inconsistent with (11), i.e. the possibility that each player 
has an equal chance of winning. From (11) and (iii) the condition that there may be no correla- 
tion between the scores of the two players with an equal chance of winning is 


Pea A Sa a 


For example, the following system of scoring makes a fair game in that sense, and reproduces 
the essential feature of the model, i.e. zero correlation: x, = 3x, + 4x2; x, = 28x, — 21x. 
We might complicate the procedure by using three such wheels, the general pattern of the 


scores of three players being then 
X= Ayx, + Agts F Agr; ; 
x, = Bx, + Box, + Boxs; 
x, == Cx, + Cota + Corry. 


For the 3-wheel case, as the reader may check by the procedure employed above, 
Cov (Xa, %5) = (A,B, + A,B, + AsB3)V z; 
Cov (Xa, Xe) = (AC, + A,C, + A;Cs)V z; 
Cov (Xp, sa = (B,C, = Ds —- B;C3)V y 
'The condition of zero covariance is therefore 
A,B, + A,Bg + A,B; = 0 
AC -t AC -4 ASS e 0 . . . > . (v) 
B,¡C; + BC, + B,C, = 0 
More generally we may say that the condition of zero covariance for such a system of scoring is 
that the sum of the products of corresponding constants for any pair of definitive equations is zero. 
This is consistent with the condition that each player has an equal chance of winning. If there 
are only 2 wheels and 2 players, the condition that the variance (V,, etc.) of the score distribution 
of each player is the same is consistent with an equal chance of winning or with zero covariance 


but not with both. For the 3-wheel game we may thus express the variance of the player’s score 
distribution 


V, = E(x, — M,)? = A2E(x, — M,)? + AZE(x, — M,)? + AZE(x; — M3) 
+ A,A,E(x, — M Nos — Mo) + A,A3E(x, — M,)(x; — Mz) 
+ A,A;E(x, — M,)(x3 — M,). 


Whence we obtain 


ASA Ae 
Vi = Gee o 


Thus equal variance of the scores of the 3-player set-up implies 
(Ai + 4+ As) = (Bi + Bit B= (+ a+G)-. . . () 
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Similarly for the 2-player set-up with only 2 wheels, it is (Aj + 43) = (Bj + B2). The 
set-up in Fig. 13 is consistent with this condition and with zero covariance since 
(A? + 42) = 2 = (B? + Bz). When the umpire spins two identical wheels, the joint con- 
ditions of unit variance and zero covariance is 


A? + 4=1=B+B? and A,B, + A,B, = 0; 


=> oe and es ; i ANDY 


It will be of interest at a later stage to ask what aspect the correlation grid would assume if 
the number of score classes were so large as to tally closely with the continuous normal distribu- 
tion. We have (Chapter 3, Vol. I) seen that the binomial (4 + 4)” defines a distribution of 
which this statement is true. Table 1 exhibits the correlation grid of the scores of the players 
when the universe of scores consists of 21 classes (0 to 20 inclusive) so distributed. The system 
of scoring is x, = %, + x, and x, = x, — x, as in Fig. 91. 

The ace-contour indicative of zero covariance again asserts itself ; but if our aim is to explore 
whether zero convariance in a normal universe would conform to the requirements of statistical 
independence, we are entitled to group our data in conformity with the implicit assumption of 
continuity as in Table 2. 

Each cell of Table 3 exhibits (above) the corresponding entry of Table 2 expressed as a 
proportionate frequency and (below) the frequency assigned by the product rule in conformity 
with the row and column border scores of Table 2. The correspondence is not very good ; 
and it is therefore clear that a bivariate distribution which closely approaches normality in both 
dimensions does not guarantee independence as a necessary corollary of zero covariance. 


EXERCISE 12.07 


The following are the weights (A,, Aa etc. in the first line, B,, Ba, etc. in the second, and so on) for the 
player’s score in accordance with the general pattern for player K: x, = Kx, + Kaxa + Kg%3 . . - 
when x,, X etc. represent the score at one trial of each of the roulette wheels with scores (1, 2, 3, 4, 5) 
and score frequencies (1:4:6:4: 1) as in Fig. 97: 


Wheel 
Player 1 2 3 4 5 
A 1 3 2 4 1 
US + A ae 
C i 2 4 2 0 
E NL E SS ee ee 
E 0 2 1 1 4 


1. Determine the variances of the 5 players’ score distributions and those of the first 3 players 
when they base their scores on those of the first 3 wheels. 


4* 
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TABLE 2 
0-2 deb 6-8 9-11 12-14 15-17 18-20 TOTAL 
(—10)-(—8) 3 | 211 
(— 7)-(— 5) 21489 
E 4(— 2) 242250 
(— 1)-(+ 1) 520676 
(+ 2)-(+ 4) | 242250 
(+ 5)-(+ 7) 
(+ 8)-(+ 10) 
TOTAL 

(—10)-(— 8) 

(DARA 

E 

a E 

(+ 2)-(+ 4) 

(+ 5)-(+ 7) 

(+ 8)-(+ 10) 


2. Make a table of correlations of players’ scores of the pattern 


A B C D E 


e ve OA ea 
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3. Make a second table showing what fraction of the variance (V,,) of the score distribution of each 
player is attributable to each score component K,x,, K,%., etc. 


4. Check all the formulae of the preceding section by direct calculation of the above. 


5. A lottery wheel like that shown in Fig. 13 has 8 sectors with scores 2 and 8 each in one sector 
and 4 and 6 each in 3 sectors. Investigate the joint distribution of the players’ scores, if they respectively 
record the score-sum of 2 double spins and the difference between them. 


6. A lottery wheel of 32 sectors carries score values 1 or 16 each in 1 sector, 4 or 13 each in 5 
sectors and 7 or 10 each in 10 sectors. Investigate the players’ joint score distribution if 


(i) A records the score sum of 2 spins, B the score difference ; 
(11) A records the score sum of 3 spins, B 5 times the score difference of the double spin. 


12.08 THE STANDARD SCORE SYMBOLISM 


We have already used the expression standard score or critical ratio in connexion with normal 
significance tests, its definition being as follows : the standard score is the ratio of the raw-score 
deviation from the mean to the standard deviation of its distribution. In the theory of the 
significance test, our assumption is that both the mean and s.d. in this context are their true, 
in contradistinction to their approximate sample, values which we commonly use to make the 
best of a bad job. So defined, the standard mean score of a sample drawn from a normal uni- 
verse is normally distributed with unit variance ; but the distribution of the approximate standard 
score then approaches normality only if the sample is large, and its correct distribution is that of 
the t-ratio mentioned in Chapter 7 of Vol. I and dealt with in Chapter 16 below. In what follows, 
the reader may judge from the context whether we employ the term in the exact or approximate 
sense. Either way, the definition does not presume a normal distribution; and either way 
standard scores have certain practical advantages, which commend their use, especially in 
situations involving the use of correlation and regression formulae. It is for this reason that 
psychologists employ them extensively. 

A special advantage is the simplification of the algebra of correlation, since all such scores 
have the same mean, i.e. zero, and the same variance, i.e. unity. We shall here use Za, Zə as 
the standard scores corresponding to the raw scores x, and x,, Le. 


he x, — Ma X» x, — M, 
AA q ee and — = 2, Sea 
Og Ca dh Tr 
A E 
a == (=) a= AA a) — A =E E e, 


By definition we have 
TN qe X, 


Oa Ob 


Tab 


) = Ble Bs), 


Thus the product-moment coefficient of two raw-score distributions is the covariance of the 
corresponding standard scores. If there is linear concomitant variation, we may write 


Xa = Ah, ER 7 E eae and Ai == AA = Ai oO 
Vo = E a + Pi et 


O94 
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We may then put 


A Ras q Ag -0 
= âu, an A 
Ta Oa 
Bio | Boo. 
——=b5, and a = By 
Op Op 


In standard score form we may then write 
Za = AySy + Aoa. o; 
By = by Fy + 96% .03 
a ae iV + av, - 0) 
Sa Ea lea o. 3 i : ; Sen 


Also as above 
Yay = AAA) + ab El 24.9) 
A F A CA a 
TE E GIO 
Since V ,,, = 1, we have 
Tap = a,b, = V(1 — ay — 8) ae ts GE AN 
If regression of x, on x, is linear we have 
My.a — My = ki .a - Xa = Ev. al xo — Mb), 


. BE) ne -o 
Op Op Oa 


Where regression is linear we have seen that 


Hence the preceding equation becomes 
Es 3 a(2s) = Tad . La > . . . . : (111) 


We may express this result as follows : 2f regression of the B-score on the A-score is linear, the mean 
value of standard B-score for a fixed value of the standard A-score is obtained by multiplying the 
latter by the correlation coefficient. 

To gain familiarity in the use of standard scores, we may here develop the formula for 
partial correlation cited in 9.04 of Chapter 9, Vol. I on the assumption of linear concomitant 
variation. We assume that there are 2 umpires, and in standard form our definitive equations 
are 

Za = AySy + AySy + AFa.03 a+ a + d= 1 = Yack 


Zo = Oy Zu + bso E bi, + be, + b = l = Fiv 


In this set-up, the total variances of the players’ score distribution are V,, = 1 = Vu». Like- 
wise V, = 1, so that 
Pay = Ab, F aai > Yau = lus Tou = b, ; 


A La A E aA 
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Our aim is to determine the value of the p-m (product-moment) index (7,,.,,) w.r.t. the A- and 
B-scores, when the source of concomitant variation arises from the W component alone, i.e. 
when z,, remains fixed, so that 


Vee G teel- md Vr. & + BH 1 HG: 

| Coni u Spa) Wels 

AyD a gta es Ga 
Veo Veo VEA VA AM 


Similarly, we may derive 


Lar 


Tao — Taw + Tow : 
l : (iv) 


VES ie e 


Taba => 


1209 REGRESSION AND CONCOMITANT VARIATION 


The method we have used to derive the equation of partial correlation in Chapter 9 of Vol. I, 
as also in 12.08 above, and conclusions we ‘have established in connexion with the models of. 
12.01-12.06 bring into focus an issue which calls for further comment. With that end in view, 
it is appropriate to recall an important logical distinction, which we may state as follows : 


(i) if B must occur when A occurs but may also occur when A does not occur, we say 
that A is a SUFFICIENT condition of B’s occurrence ; 


(ii) if B cannot occur unless A also occurs, but may not always occur when A does, we 
say that A is a NECESSARY condition of B’s occurrence ; 


(iii) if B cannot occur unless A also occurs, and must occur if A occurs, we say that A is 
both a sufficient and a necessary condition of B’s occurrence. 


We may clarify the distinction, if we invoke two antecedents A and C. If B occurs only 
when both A and C occur, neither A nor C is a sufficient condition of B’s occurrence but each 
is a necessary one. If B occurs only when either A or C occurs alone, each is a sufficient but 
neither is a necessary condition of A’s occurrence. 

We have shown in this chapter that 


a) the equation of partial correlation follows from the law of linear concomitant variation 
q e = e e . . e 
(L.C.V.), i.e. that L.C.V. is a sufficient condition for its validity 


(b) a law of linear regression (L.R.) is not a necessary consequence of L.C.V. and conversely 
L.C.V. is not a necessary accompaniment of L.R. 


Thus linear regression is not a necessary condition of the validity of the equation of partial 
correlation. That it is not a sufficient condition is demonstrable by recourse to the model of 
12.02. We can modify the latter by postulating a third player C who takes c cards from a pack 
without replacement after A and B have respectively taken a and b cards. We shall assume that 
each player records his heart-score. ‘The only meaning we can then attach to the correlation 
(Po...) of the B- and C-scores in the absence of any effect due to the choice of A implies the 
elimination of A’s choice. The formula is then the same as for 7,, if we substitute b for a and 
c for b, i.e. 


ead PC lS 5 e 
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To test the validity of the equation of partial correlation in this set-up, we shall require to deter- 
mine Facs Toa and Fea when all three players draw. We already know that 


-o > a ee 


For Ma, Mo, Vas Vy we may employ the expressions already derived in 12.02. Since we have 
shown that the prior choice of A does not affect the variance or the mean of the B-score distribu- 
tion, we may also write 


Ne and y EA. ee a 


We may write the mean value of x, for a fixed value of x, and x, as M,.,,, and its value as 


c(np — Xa — Xp) 
M A aS 
Pra n — a— b 


In our grid notation 
Me = Es AM. as) and Mo. E. UM, 55) 
Whence we derive | 


um Mth a a) Ev do) _ emp — Xa) (mp — Ha) _. 
Sab n—a—b n—a—b (n—aln—a—)’ 


So 2 AA a ec ea eee 
n—a 
Similarly 
yp (tb — 5) En v(m) Mp — a) Mp — ms) 
E ee Ee A ab (nn — a — by” 
ah oe O eee 


From (iv) we derive 


cnp . M, = PTM) O acpq | 


E, RS — 2 — , 
(x ) nu n—a Fs ñ= | 
‘, Contar a) DE, 

n — 


To A ee oS Oo 


Similarly, we obtain 


o y bc z 
be = E TPE ae : i : PRET 
- From (1) and (vi) we thus obtain 


‘ cc Oe A O 
i= ri — Tu) = ain “tao 


av be 
(n + av (n — bin — c) 


Yab Vac = 
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Also from (11) 


A ol n y bc 
be Cp? e n—a (n — Din — c)’ 


O a a ee 


But we have seen, as in (1) that 
eae q bc 
Yoc.a se (n — bin —c) 


Thus the expressions on the right of the last two equations are unequal, i.e the relation defined 
by (iv) of 12.08 on the assumption of L.C.V. does not hold good. Nevertheless L.R. continues 
to apply in both dimensions, since we can write (iv) and (v) in the form 


nep  cX. + ap) 


A a a 
nap e dl 
Ho a n—b ce 
c c 
.M,.,—-M,=-—- .X, and M,.,—M,=-— ee 
n—a n—b 


What is common ground to L.C.V. and L.R. is that each suffices to guarantee that the product- 
moment index has its essential summarising properties, in that its limits of + 1 define perfect 
correspondence. Its zero value consistent with independence is inherent in its definition, since 
independence implies zero covariance. That linear regression in one dimension alone suffices 
to define the limits of r,, follows from the fact that L.R. implies the identity 
PUM). Va Mew) 


2 eer A O 


ab — ba V, V, 


When correspondence is perfect V,., = 0 = M(V,..), so that 1%, = 1. The summarising 
properties of the p — m index when L.C.V. holds good follows from the structure of the 2 types 
respectively definitive of a consequential and a concurrent relationship. ‘Thus for one common 
bonus (x,,) 


Pe Av, 
A EN” OS... 
a ABV? 


AE CRY, AA 
Perfect correspondence in this case arises when the player’s individual score is zero, so that 
Va.o = 0 = V,.., and both the foregoing expressions reduce to unity. 

We may sum up the foregoing discussion : 

(a) linear concomitant variation and linear regression have this in common that each is a 


sufficient, neither being a necessary, condition of the limiting values set by perfect 
correspondence to the product-moment index ; 
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(b) linear concomitant variation is a sufficient condition of the validity of the equation of 
partial correlation, but is neither a sufficient nor a necessary condition of linear re- 
gression ; 


(c) linear regression is not a sufficient nor a necessary condition either of L.C.V. or of the 
validity of the equation of partial correlation ; 


(d) L.C.V. defines a causal nexus in the sense that it prescribes two different equations each 
definitive of a consequential relationship as an adequate and explicitd escription of a 
concurrent relationship ; 


(e) L.R. is a descriptive device which implies no causal nexus, since regression may be 
linear in both dimensions, in one dimension or in neither when the relation between 
the variates is concurrent. 


CHAPTER 13 


ASSUMPTIONS UNDERLYING ANALISIS 
AND SYNTHESIS OF “VARIANCE 


13.00 ANALYSIS OF VARIANCE 


No statistical technique has attained in so short a time greater popularity than the one referred 
to in the title of this chapter ; and it is safe to presume that any reader of this book will have 
at least a nodding acquaintance with it. One reason for its extensive use in biological and socio- 
logical work is the existence of manuals which give very explicit directions for the necessary 
computations by recourse to examples chosen from contemporary investigations. To the 
student who understands the logical credentials of the method and the algebraic assumptions 
of the significance tests it invokes, such instruction is invaluable. Since it is accessible in 
any well-stocked library, there will be no need to burden ourselves in this context with arith- 
metical illustrations which the student will find set out in such texts as those of Snedecor, 
Hagood, Tippett and others. 

In what follows our aim is different. It is all too easy to be led astray by extraneous simi- 
larities, if one relies on exemplary material as a guide to the best way of dealing with a statistical 
problem. Seemingly similar situations in the conduct of enquiries may indeed raise essentially 
diverse logical issues and may be consistent with very different admissible assumptions about 
score distributions. Consequently, recourse to a statistical technique without a clear grasp of 
its rationale is an invitation to its misuse ; and the theme of this chapter is no exception to the 
rule. For reasons stated in 11.00, we shall not attempt to set forth the appropriate significance 
tests at this stage. They will be the subject of treatment in Chapter 16. Here our concern is 
to clarify what we can accomplish by means of the Analysis of Variance, and also what we cannot. 

At the outset, it is important to recognise that the term itself covers several statistical pro- 
cedures of which some have wider applicability than others. ‘Though their several limitations 
have been the theme of discussion in scientific journals,* notably by Churchill Eisenhart and by 
Lee Crump, whose views we quote below, the student who is not a professional mathematician 
can turn to few, if any, accessible sources from which it is possible to get into focus what factual 
postulates justify the relevance of the algebraic expressions to a practical situation. Indeed, 
the intricacy of the relevant computations, so adequately expounded elsewhere, and the novelty 
of the mathematical technique invoked to justify the appropriate tests of significance, alike con- 
spire to defeat the attempt to do so, unless we distinguish sharply between the following issues : 


(a) the derivation (as in 11.05) of computing devices which rely on tautologies of a grid, 
and as such have no necessary connexion with the theory of probability ; 

(b) what causal assumptions about the real world are implicit in the several procedures we 
shall discuss in this chapter ; 

(c) what factual assumptions we implicitly make about score distributions in prescribing 
tests which are the theme of Chapter 16 ; 

(d) the formal algebra (Chapter 15) of the distributions invoked as a basis of such tests. 


* S. Lee Crump (1946), ‘Estimation of Variance Components in Analysis of Variance’. Biometrics (Amer. Stat. 
Ass.) 2,1. 

H. E. Daniels (1939), ‘ Estimation of Components of Variance’. F.R.R.S. Supp., 6, 186. 

Churchill Eisenhart (1947), ‘ The Assumptions underlying Analysis of Variance’. Biometrics, 3, 1. 
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13.01 MULTIPLE CRITERIA OF CLASSIFICATION 


Statistical treatment of practical issues is possible only when we can score our observations in one 
of two ways: (a) taxonomically, as when we say that the number or proportion of items with 
an attribute A (e.g. sickness, colour) is some number x ina sample of size r ; (b) representatively, 
when we assign a measurement (e.g. height, weight) or a number (e.g. wages, seeds per pod) to 
each sample item and specify the sample score by a sum, mean or other figure which takes account 
of their individual values. Having assigned scores of one or other sort to our observations, we 
may classify them with a view to disclosing some agency which makes them vary. For example, 
we may divide 

(i) a population of diphtheria patients into those who respectively did and did not receive 
serum treatment, scoring the sub-samples taxonomically by the proportion of fatal cases 
in each ; 

(ii) a batch of children of the same age and sex into groups respectively travelling at least two 
miles and less than two miles a day to get to school, scoring the result representatively 
by recording the median place in the terminal examination or the mean number of 
absences in a year. 


In either case, the practical issue is reducible to the same type of question when stated in the 
language of statistics used by the predominant school of the last two decades, that of 
R. A. Fisher. Our first concern is to decide whether the magnitude of the difference observed 
is consistent with the null hypothesis that the sub-samples come from the same universe. If we 
may legitimately conclude contrariwise, we may then seek to arrive at some estimate of how the 
universes themselves differ. If we confine our attention to a single criterion of classification, 
the effects of treatment or travelling long distances in the foregoing examples, we may often 
make a two-fold split; but it is sometimes difficult to exclude sources of variation other than 
those which are our main concern and therefore convenient to divide our material into more 
than two classes within the framework of the same criterion. One way in which we may then 
proceed is the theme of what follows. 

Let us first be clear about what we mean by class and by criterion of classification in a situa- 
tion involving two criteria and two or more classes with respect to each criterion. We shall 
suppose that we have before us figures w.r.t. a single determination of the red blood cell count 
of one male and one female of the Angora, Blue Bevran and Polish Giant breeds of rabbits. We 
may then lay out our scores (x,;) gridwise as below. 


TABLE I 
Angora Blue Bevran Polish Giant Row 
Male X11 X921 X31 7 = 1 
Female Ris ce X30 $2 
Column pa 1 Y t=3 


Here we have two criteria of classification sex and breed, involving 2 classes w.r.t. the first and 
3 classes w.r.t. the second. It may well be that sex affects the score value in the sense that the 
mean score of one sex differs from that of the other in the absence of any other source of variation, 
i.e. agency responsible for score differences. To say that sex is the only source of variation so 
defined would signify that the scores of ail males are the same and those of all females are the same, 
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the single male score value (which is then the male mean score) being different from the single 
female score value (which is then the female mean score). ‘Thus the variance (V,) of the score 
distribution within each row will be zero, so that M(V,) = 0. Hence the variance of the row 
mean scores V(M,) is equal to the total variance (V) in virtue of the tautology of the score grid, 
V = M(V,) + V(M,). Contrariwise, the column means will be identical so that V(M,) = 0 
and V = M(V,). By the same token, we might write V = V(M,) and M(V,) = 0 if the only 
source of variation were the breed, in which case also V(M,) = 0 and M(V,) = V. 

In general, neither proposition last stated will be true. For we know that any single 
determination of the r.b.c. is subject to error of observation and to individual circumstances 
unconnected with either sex or breed. In virtue of these residual sources of variation, the 
scores within a row or column will differ in the absence of any row-effect attributable to sex or 
column-effect attributable to breed. Because of the residual effect alone, both row means 
and column means may therefore differ. If there is no row-effect, the fact that V(M,) exceeds 
zero is then attributable to the residual source of variation alone, as is the fact that V(M.,) 
exceeds zero when there is no column-effect. If there is neither a row- nor a column-effect, 
the values of both V(M,) and V(M,) depend uniquely on the residual source, and we may 
expect to discover some relation between them, consistent with that assumption. 

To say that there is no sex-effect in this context 1s to say that row samples come from one and 
the same universe ; and we express this alternatively by saying that the universe is homogeneous 
w.r.t. the row-criterion of classification. ‘To say that there is no breed-effect is to say that 
column samples come from the same universe and that such a universe is homogeneous w.r.t. 
the column-criterion. ‘To say that the universe is homogeneous in both dimensions is the 
same as saying that the residual sources suffice to account for all variation. 

One class of procedure subsumed under the term analysis of variance has as its aim to decide 
whether a system of scores is homogeneous w.r.t. one or more criteria of classification, i.e. to test 
the null hypothesis that one or other putative source of variation defined by the classificatory 
set-up is negligible. ‘Thus the null hypothesis is that sex or breed or both do not significantly 
affect the r.b.c. in the example last cited. If our 2-way table for 2 criteria of classification has 
c columns, 7 rows and hence 7c cells in all, homogeneity w.r.t. both criteria signifies that each 
row-set is a c-fold, each column-set is an 7-fold and the entire set of scores is an rc-fold sample 
from one and the same universe. If so, our problem is: 


(a) to define consistent relations between parameters of the score distribution referable to 
either dimension alone and to the grid as a whole ; 


(b) to test the consistency of such parameters. 


Any significance test specified by (b) must naturally rely on certain assumptions about the 
score distribution of the putative common universe. Such assumptions may be more or less 
plausible in a given situation; but we can define criteria of homogeneity in the sense im- 
plied by (a) without invoking them. It will therefore be convenient to reserve discussion of 
(b) till a later stage. 

A corresponding dichotomy is helpful, when we turn to other classes of problems to which 
the expression analysis of variance may refer. If the universe is not homogeneous, we are 
entitled to ask how much of the variance is attributable to one or other source. The end in view 
may then be 


(1) to make an exhaustive balance sheet exhibiting what fraction of the total variance arises 
from each source ; 

(11) more modestly, to assess the residual component with a view to specifying the sampling 
variance of a set of class means in the absence of variation arising from other sources. 
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More explicitly, (i) signifies a specification of the extent to which we should proportionately 
reduce the variance of the parent universe, if we eliminated one or other source of variation, 
e.g. by replacing all males by females or vice versa in the foregoing example. It is important to 
notice that we can reduce it in two ways, and we have at this stage no reason to believe that 
the results would be the same. They are indeed the same only if we make certain assumptions 
which have no relevance to the discussion of homogeneity. 

To appreciate the meaning of (ii) above, we should recall that the observed variance of the 
column (breed) samples of our foregoing illustration will partly depend on sex if sex is a true 
source of variation. If our aim is to ascertain the variance of each column mean regarded 
as a parameter of the breed effect alone, i.e. the true variance of the mean of males alone or 
of females alone for each breed, our only concern is therefore with one component (the 
residual) of variance. 

To construct an exhaustive balance sheet of all the components of variance or to specify 
a particular component, we have to rely on certain assumptions about the ways in which the 
several sources of variation contribute to the individual score value; and this raises an issue 
which does not arise when our sole aim is to test the null hypothesis that one or other source of 
putative variation is negligible. Needless to say, we can ask no more of our sample than an 
estimate of any component of variance. There then arises the question : to what sampling error 
are our estimates subject, i.e. what are their fiducial or confidence boundaries ? Here we must 
introduce other assumptions concerning the distribution of the score components. We shall 
find it easier to steer a way through a labyrinth of practical difficulties, if we examine the 
simpler issue: how is it possible to construct the balance sheet? Our first task will be an 
attempt to visualise random distributions of samples classified w.r.t. one, two or three criteria 
by recourse to statistical models. It will then be possible to exhibit the formal logic of 
analysis of variance by recourse to the symbolism of 11.05 without the danger of losing ourselves 
in a maze of symbols. 


13.02 THE COMPLETE SAMPLING DISTRIBUTION 


In Vol. I we have become familiar with the chessboard device as a means of setting out the 
distribution of variously classified r-fold samples from a static n-fold universe in conformity 
with the principle of equipartition of opportunity for association ; but we have also acquainted 
ourselves with the advantage of a somewhat different approach in 12.00, where we have spoken 
of a universe in action, i.e. as a random distribution of all possible r-fold samples. In the entire 
assemblage of samples constituting such a random distribution every item (score value) is 
necessarily present with the same proportionate frequency as in the parent universe. ‘This 
is sufficiently evident from the build up of the 3-fold toss of a penny when we score heads as 
1 and tails as 0: 


0 1 0.0 ee 1.0 Lo 
0 0.0 0.1 OP oasis 0.0.1 0:76 0.1.1 
1 1.8 | mos 1 1.0.0 Poz] oe D P ee | 
Two-fold Toss Three-fold Toss 
Heads 4 Heads 12 


Tails 4 Tails 12 
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In the same way we may lay out as below the 2-fold toss distribution for a tetrahedral die 
with face scores 1, 2, 2, 3 as in Fig. 70 of Vol. I, and the reader may check the rule by con- 
structing a similar grid for the 3-fold toss : 


1 2 2 3 


Unit trial scores . ; eae 2 
Numbers of each . 3 oe 16 8 


The fact that the distribution of individual score values, i.e. scores of unit trials, in each cell 
entry of the chessboard lay-out for the r-fold sample is the same as that of individual score 
values in the parent universe, i.e. the same as the unit sample distribution, signifies that the mean of 
all individual score values in the chessboard lay-out of the distribution sample classes is identical 
with that of the unit sample distribution. For the same reason the variance of the distribution 
of individual scores in the entire assemblage of chessboard samples is likewise identical with 
that of the unit sample distribution. If we postulate that the relative frequencies of such sub- 
universes tally with the relative frequencies of corresponding r-fold sample classes in a random 
sample distribution, we can therefore conceive of each class of samples as a class of sub-universes 
within the parent universe. 

To develop this conception further without losing sight of its visual meaning, we may here 
usefully pause to recall some elementary properties of sampling distributions. In Vol. I we 
have drawn a distinction between the following parameters of a sample distribution classified 
w.r.t. one criterion of sample structure (e.g. number of hearts in the sample or total number of 
pips face upwards) : 


(a) the unit-sample variance (V,,), being the variance of the distribution of individual 
scores in the parent universe ; 


(b) the variance (r. V,,) of the distribution of the sum* of individual score values in r-fold 
samples drawn independently from the parent universe with replacement ; 


(c) the variance V(M,. ,) = (V,, — r) of the distribution of the mean (M,. ,) of individual 
score values in such 7-fold samples ; 


(d) the observed variance V,. , of the distribution of individual score values (x;) whose mean 
value is M, . , within a particular sample, so that 


A A A 


* Note—In the domain of binary taxonomic scoring, the unit scores are respectively 0 and 1 and the terms of the bino- 
mial (q + p)! define the unit sampling distribution with mean p and variance V,, = pg. The r-fold score sum is then the 
r-fold sample raw score (0, 1 . . . r) whose distribution corresponds with successive terms of the binomial (q + p} 
with mean rp and variance rqp. The r-fold sample mean score is the proportionate score whose distribution like- 


wise accords with successive terms of (q + p)" for score values 0, —,— . . . 1, the mean of this distribution being p 
rr 
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In accordance with the notation of 11.05, we may write (i) more economically in the form 
Pe, we) =, BRO ME ee ye 


In Chapter 7 of Vol. I we have seen that the expected (mean) value of the sample variance 
(V,.s) is somewhat smaller than that of the unit sample distribution, being in fact defined by 
the relation 

r 1 


A E ¿ee ; i ; ; (18) 
We shall now derive this relation by recourse to a notation which makes the meaning of 
every step explicit. Let us first be clear about what we call the mean sample variance in 
this context. The statistic (V,.,) defined by (1)-(11) is an r-fold sample statistic. The statistic 
defined by (111) is the mean value of this sample statistic in the universe of all r-fold samples 
with relative frequencies assigned by successive application of the product rule. We may 
make the distinction clear by: (a) using E, for the operation of extracting the universe mean 
of the random distribution of any sample parameter; (b) employing the dot notation to dis- 
tinguish any parameter of the r-fold sample (e.g. M,., or V,., for the mean score or variance) 
from a parameter of the universe (e.g. M, or V,). With these conventions we may write 


M(V +. $) = EV ,. s) = E.[E,(x,) — Mọ a] = Es . E,(x;) — Ed M.,); 
V(M,.s) = EM... — M,) = EM.) — MF; 
ME + V(M,.,) = E,. E,(x2) — M? = E,. Efe, — M,F. 


The expression on the right of the last equation is the mean value of the square of individual 
score deviations from the true mean in the universe of all r-fold samples, i.e. that of the 
unit sample distribution, so that 


MV.) + VM: 4) =r, 


We have also seen that 


Vu 

V(M,..) =—, 
ee l 
o . . . Oy) 


We may conveniently visualise a complete sample distribution involving one criterion of 
classification by setting out each permutation of individual scores in the chessboard cell entries 
as rows of a 2-dimensional score grid. Fig. 98 and Table 2 show the relevant calculations for 
a score grid exhibiting the random distribution of 3-fold samples of a flat circular die with 1 pip 
on one face and 2 pips on the other. ‘The binomial (4 + 4} then defines the unit sample dis- 


tribution, so that M, = $ and V, = 1. Alternatively, we may lay out the distribution more 


and the variance pq — r in accordance with (c) above. On this understanding, we may regard taxonomic scoring 
in the binary universe as a special case of representative scoring. 

More generally, we speak of the unit sample distribution as a binomial variate, if the frequencies of individual 
scores correspond to successive terms of (q + p)* in the range m to m + aAx, when the frequency of a score m + xAx 
is given by az). p*.q*~*. The mean (M,) of the unit sample distribution is then m + apAx and the variance 
apq(Ax)?. What we call taxonomic scoring in the binary universe signifies that a = 1, m = 0, Ax = 1 so that 
M,, =p. The binary universe of the flat circular die with 1 pip on one face and 2 on the other does not fulfil this 
specification, since Ax = 1, but m = 1 and M, = (p + 1), though V,, = pq as when the method of scoring is taxo- 
nomic. When p = 4 = qand a = 1, the distinction between a rectangular and binomial variate breaks down. 
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economically by recording the frequencies of all samples containing the same set of individual 
score values regardless of order. ‘Table 3 employs the frequency grid lay-out for the complete 
random sample distribution for the 4-fold toss of a tetrahedral die with 1 pip on one face, 2 pips 
on two other faces and 3 pips on the fourth. The u.s.d. accords with terms of (4 + 3)?, so that 
M,, = 2, V, = +4, and M, = 2. The cell entries specify the number of permutations consistent 
with the same combination of individual scores in the 4-fold sample. ‘Thus there are 4 permuta- 


tions of toss-order consistent with the sample structure 1113 and 12 permutations consistent 
with 1233 etc. 


THE UNBIASED ESTIMATE OF VARIANCE 


OG) OC A 


MEAN (M) Mrs E(x?) = 3 Ex? Wes = 435x — Ms  S2=3 Ves 


OOO + E geo o 


Variance of Unit Sample Distribution: 


4 TE ge es a E 
000 + 1 menn nei FEET 
16 ANS 
000 + 1 oson nti j 
Variance of Mean of 3-fold Samples: 
5 25 1(1? +2? = 2 V 
e IOO 3 9 $(1° +2742?) =3 Jue 75 5 Va. == 
BOO + 9 Men o’ 
Mean of Variance of 3-fold Sample 
25 fn ee ne \ 
© AE) 3 5 £(2*+1%+2°) =3 3- 2 5 3 Score Distribution: M(V¿s) = + 
QOO + 1 omena ee d 
Mia aid Ts do li a, 
6 2 2a? 
DOS + + mee apa o ÓN 
TOTAL: 2 158 20 12 2 
MEAN ,EJ(=): 32 q = + + 
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Fic. 98. The Unbiased Estimate of Variance—the complete sampling distribution of the 3-fold toss of the die of 
Fig. 67 in Vol. I, exhibiting the relation between the mean variance of the score distribution within each sample and 
the variance of the u.s.d., i.e. the score distribution of all possible samples pooled in appropriate proportions. 
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TABLE 2 
si = 
Vag 2 
Sampl Mo M?. EA ie r 
a (7) BA) MiG Ha ¡nos 
1 1 1 1 1 1 0 0 
| 
| 
| 


M, E(M?..) E. E3) M(V,.s) = ELV+. s) Es(s;) 
5 
V, =- È} = 4; VW) = 4- (6) = 

TABLE 3 

Sample Structure 
1111 1112 m3 1122 1128 1133 1222 1223 1233 1933 2222 2223 2233 2333 33939 
1| 4 24 12 48 48 12 32 48 24 4 0 0 0 0 0 
Scores 2 | 0 8 0 48 24 0 96 96 24 0 64 96 48 8 0 
3} 0 0 4 0 24 12 0 48 48 12 0 32 48 24 4 
Totals 4 32 16 96 96 Ja e 193 96 16 64 128 96 32 4 
Mos ER PO A. MA ee IA a A 
A A O 
ee A. CG CG 
s 0 z 1 3 Toa oO d 12 1 0 z E 4 0 


Va = 10251 — 2)? + ioral? — 2)? + Toral — 2)? = 2. 


Totals 


256 


512 


256 


1024 


tol jw oje bo 


M, = E,(M,..s) 
V(M,.s) 
DAV’ ,....9) 

E,(s?) 


With a lay-out of either type we can visualise the meaning of (iv) above in a new way, since 


MV, .s) = pA ea). We may write 
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We can thus define a sample statistic whose expected value is the true variance of the universe 
by the relations 


—V,.,=s piel nr eae and LE >. j 0) 

It is customary to speak of sê so defined as the unbiased estimate of the u.s.d. variance, i.e. 
variance of the distribution of individual scores in the parent universe. 

The foregoing analysis involves classification of the sample by one criterion. We can introduce 
a second criterion of classification, if we toss more than one die. We then have to represent 
each class of samples by a two-dimensional lay-out, and the entire assemblage of the random 
sample distribution as a 3-dimensional grid of which each layer is a sample. ‘The number 
of layers having one and the same set of scores in corresponding cells must then tally with the 
relative frequency of the sample so defined. 

Fig. 99 shows the build up of the grid in accordance with the chessboard principle for the 
2-fold toss of each of 2 unbiased pennies. If we assume that each toss is a fair toss, the dis- 
tribution of scores in each pillar, each row-slab and each column-slab will accord with one and the 
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Fic. 99. The results of 2 tosses of an unbiased coin set out as a 2-way classification, the criteria being identity 
of coin and order of toss. All possible samples are shown in their correct proportions. 
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Two-fold Toss of Coin B, with 


Two-fold Toss 2:1 bias in favour of Heads 


of unbiased Coin A o 


6 
O 
00000000000 0000 


COIN A 


© Ole eje 


29 


Fic. 100. Chessboard lay-out for 2-fold toss of a biased coin (2:1 in favour of heads) exhibiting results for the 
same 2 criteria of classification as in Fig. 99. 


same unit sample distribution, i.e. that of individual scores in the entire 3-dimensional universe 
of samples. We then say that the universe is homogeneous for both criteria of classification. The 
pillar-mean scores of this example will thus be 


Coin 
A B 
1 3 3 
Toss SE A E OR 
2 3 } | 


Í 


In this case, each of 16 possible samples of different structure occurs with equal frequency. 
If one coin has a bias the number of different sample classes is still sixteen ; but their frequencies 
are not all equal. Fig. 100 shows each stage of the lay-out for the 2-fold toss of two coins, one of 
which (A) has no bias, while the other (B) falls head upwards twice as often as tails. ‘Thus the 
definitive binomials of the unit sample distributions are respectively (4 + $)! and (4 + $). 

Here the entries of the same column refer to the same coin, entries of the same row to the 
same toss-order. Each column-slab in the universe of Fig. 101 thus constitutes a homogeneous 
sub-universe in the sense that the distribution of individual scores within pillars of one and 
the same column-slab are identical, but the distributions of scores in different column-slabs 
are not identical. The within-pillar distributions of the same row are not identical; but the 
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SAMPLE DISTRIBUTION 
for 2-fold Toss of unbiased Coin A 
and 2-fold Toss of biassed (2:1) Coin B 


Fic. 101. The 3-dimensional universe of all samples shown in Fig. 100. 


row-slab distributions are identical with one another and therefore with that of the universe as 
a whole. The pillar-mean scores will be 


Coin 


Toss 
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Of this set-up, we say that the universe is homogeneous only for the row-criterion of classi- 
fication. Thus we can define a universe as homogeneous w.r.t. either or both of two criteria 
of classification, if we conceive it as a complete random distribution of rc-fold samples classified 
with respect to one (column) criterion involving c classes and a second (row) criterion involving 
r classes. So conceived our criteria of homogeneity are as in Table 4. 


TABLE 4 


DISTRIBUTION OF INDIVIDUAL SCORES IN 


HOMOGENEOUS 
WITH RESPECT sE ; q 
~~: Row: Slabs Fee Pillars within Pillars within 
row-slab column-slab 
Both criteria as for whole grid as for whole grid as for whole grid as for whole grid 
Row-criterion only as for whole grid different different as for column-slab 
Column-criterion only different as for whole grid as for row-slab different 


We can still visualise the distribution of samples from a universe classified w.r.t. 3 criteria, 
if we conceive each sample as a 3-dimensional grid constituting a stratum and the universe of 
samples as a vertical succession of such strata. As a simple illustration we may consider the 
result of the two-fold toss of 4 coins : 


A French silver C American silver 
B French copper D American copper 


We have now 3 criteria of classification : (a) toss-order ; (b) metal; (c) nationality ; and we are 
entitled to ask whether the universe is homogeneous with reference to any or all, i.e. whether 
each toss is a fair toss, whether French coins have the same bias (if any) as American coins or 
whether copper coins have the same bias (if any) as silver coins. We may lay out in one layer 
of the sample stratum (Fig. 102) French coins by toss-order and metal as our criteria and in the 
other American coins by toss-order and metal in corresponding dimensions. In this case each 
stratum consists of 2 layers each of 2 columns and 2 rows. If there are n classes with respect 
to the layer-to-layer criterion of classification, the stratum will contain ncr cells, and the universe 
grid of s-strata will consist of ncrs cells. 

Here, as elsewhere, we use the term sample frequency (ys) for the proportionate contribution of a 
sample class to the entire distribution of such classes, so that the sum of all frequencies is unity. When 
we speak of relative frequencies (f,) we signify corresponding whole numbers in the same ratio and define 
s as their sum, so that y, = (fs — s). If the grid representative of a complete sample distribution con- 
sists of s layers (or strata, as the case may be), we assume that there are f, layers (or strata) exhibiting 
the lay-out of a particular sample structure of relative frequency f,. We may then define the operation 
of taking the mean of a sample parameter (U,) alternatively as 


00 k=s 
Zs.. U, = BU) =>, Y (U) 


If we care to regard the theoretical sampling distribution as continuous, we must interpret the summation 
as an integral. This does not affect the ensuing argument. 


13.03 CRITERIA OF HOMOGENEITY 


We have already defined in general terms what we mean by criteria of homogeneity in this con- 
text, viz. the definition of sample parameters whose expected values should be consistent. In 
what follows, we shall first explore the issue vis-d-vis 2 criteria of classification. 
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With this end in view we conceive our sample as a layer of r rows and c columns in a 
3-dimensional grid of s layers like that of Fig. 85. Our assumption will be that the frequency of 
layers with a given structure tallies with the frequency of the random distribution of correspond- 
ing rc-fold samples. We shall denote as x;;, the score of the cell in column 2, row 7 and layer 
k, indicating by the convention of 11.05 whether a parameter of the grid refers to a sample, to 
a column within a sample or a row within a sample, as below 


Mean Variance within No. of cells 
Row . ; ee ee PSA č 
Column : E ee AN T 
Sample (layer) ier A A rc 


For the true variance of the putatively homogeneous universe we shall write o? = V,. The 
evaluation of unbiased estimates of o? based on different parameters of the sample is simple in 
the notation of 11.05, if we recall the fact that the operations denoted by E, E, and E, are such 
as we can perform in any order. ‘Thus we may write 


Di . MV z. a Sera E,.E Ve. $ = dy a a ) 
E: . MV ... = oe Eo Oe “a E E,. ELV y : ade 


Now the sample variances Va. 25 Vo. ys and V.., respectively refer to samples of r, of 
cand of rc scores. In accordance with the rule we have recalled in 13.02, we may therefore write 


= 2 3 
laa; E(Ve.n)= ot; E(Ve..) = ot 


BLK y, id =T 


YC 


Since these expressions involve only the numbers r and c (which are fixed for all the samples 
of r rows and c columns) and o? which is a constant of the universe | 


r— I 


| ee eG P SE o? AV al 


c— 1 


SEY O are O O 4): 

Since any systematic sources of variation associated with the row or column criteria of 
classification will respectively have the effect of changing the variance of the distribution of 
the row-means or that of the column-means, we should expect that estimates of o? derived there- 
from would be consistent only if the universe is homogeneous. Accordingly, we first seek to 
express o? in terms of the expected values of V(M,.,,) and V(M..+.s). By recourse to the 
fundamental tautology of the grid 


Lats See MV z. pet ey E, . VMs. ah 
n El Mir) = BAY A A es) 


ERE pies ea 


= ——o ge 
rc c 
r— | : 
a E; . VM.. ve = ee . . . . . 3 (i) 
Similarly we derive 
— 1 
EPM. et o O 


TE 
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From (i) and (ii) we can therefore define two sample statistics by the relations 


ES)=0* and $=- de A O 
A rc 
LE) 0 and - = 5 VM. es) i ; ; . (iv) 
In the notation of the computing schema defined by (x)-(xiv) of 11.05, we therefore have 
 S,—S Se — S 
GEA OR > 
IN) and sí led i : oe 


We now have two estimates of o? based on variation in alternative dimensions of the grid. 
We can obtain one which takes into account variation in both dimensions from V,, , and another, 
which will later prove to have a special significance, if we recall the parameter (V,) defined in 
accordance with (viii) and (ix) of 11.05: 

F me Pa: ‘> V(M,. oe =E EEN 

The interest of this statistic has emerged in our discussion of the Handicap Score-grid Model 
of Chapter 10 (Vol. I). We then saw that the corresponding parameter of the universe is equal 
to: (a) the total variance in the absence of a row or column source of variation ; (b) the residual 


component, of variance, if strictly additive independent row and column sources contribute to 
total variation. From the above, we have 


EV.) = E, . MV , : és) as E, . MV z. e) za BAN y. A 


Fei =] il 
= h = a”; 
r C TC 


= a. 


We can now define a statistic whose expected value is the true variance of o? by the relations 
A E rc 
E(s) == and $ TEE TEN po” ; ' E . (vi) 
In the notation of the computing schema defined by (x)-(xiv) of 11.05, we may write this as 
es See ee (vii) 
ma DeD 
The statistic sê takes into account variation in both dimensions of the sample grid, and our 


criterion of homogeneity in both dimensions is that the numerical values of neither s? nor s? 
differ significantly from that of s2. We may lay-out the three estimates as below : 


ees oer Divisors Unbiased 
Sums of Squares > Ki) (degrees of Estimate 
i=1 j=1 freedom) of o? 
HERM +s) =85,=8 r=1 E 
re. V(M,.¢) = S.— S c— 1 s3 


re.V,=S,+S—S,.—S, (r — 1)(c — 1) Oe 
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The following numerical example referable to a 3 by 3 grid illustrates the computation. 


(a) (b) 
Scores Total 13 | Square Scores 
3 1 8 12 144 9 1 64 
9 7 5 21 44] 81 49 25 
6 4 2 12 144 36 16 4 
Total 18 12 15 45 729 Total 285 


T2| 324 144 225 | 693 


S, = 285; S = 4(45*) = 295; S, =$ = 231; 
S, = 432 = 243. 


Sum of Squares 


Estimate based on tia ae Divisor Estimate 
Rows 243 — 225 = 18 2 9 
Columns 231 — 225 = 6 2 3 
Residual 285 + 225 — 243 — 231 = 36 4 9 


Total 60 


When our concern is with 3 criteria of classification, the number of possibilities we may wish 
to explore involve not only the existence of an additional systematic source of variation but every 
possible interaction between all three of them. Which statistic we chose to compare with the 
residual as a criterion of homogeneity or non-interaction is an issue which will raise less difficulty 
for the keginner if we defer it till we have examined the balance sheet of 13.06. In the deriva- 
tion of (i) above we have used the relation 

EVs.) = ease 
rc 
Accordingly, we may define a statistic which takes into account variation in both dimensions of 
the grid by 
rc 


eas rn 
rc — 1 


Some readers may well ask : why should we not prefer s? so defined in preference to sz as defined 
by (vi) as our yard-stick of comparison when the end in view is to decide whether the column- 
means or row means yield an adequate estimate of variation in both dimensions of the grid ? 
The fact is that we do—in a roundabout way—when we test the significance of a correlation 
ratio as in 16.08 below. Indeed, the issue is not referable to the arbitrament of common sense. 
In this context, a sufficient answer is that we shall require s? for a different use, if the end in view 
is to assess the significance of differences between column- or row-means, when the universe is 
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not homogeneous in both dimensions; but we may here anticipate another one. We shall 
later see that the ratio of sí or of s? to s¿ is a Type VI distribution, whereas that of either to sf, 
defined as above, is a Type I distribution. It happens that there are more convenient tables 
based on the former available for assessing the consistency of different estimates. 


1304 A BALANCE SHEET FOR Two CRITERIA 


As stated, we shall not at this stage examine the rationale of a test for the consistency of 
estimates of the variance of a putatively homogeneous universe. If the result of such a test, 
as explained in 16.05, leads us to reject the null hypothesis, we may then consider the question : 
what fractions of total variance in the universe of sampling are respectively attributable to 
agencies associated with the several criteria of classification and to residual sources ? 

The credentials of any such balance sheet depend on a new set of assumptions, which we 
may specify under three headings : 


(a) causal, inasmuch as they refer to which effects of different components of variation con- 
tribute to a particular score value ; 

(b) statistical, inasmuch as they refer to the distribution of the score components singly or 
jointly ; 

(c) operational, inasmuch as they depend on the framework of repetition which the 
experimenter has in mind. 


To clarify this threefold distinction, it will be helpful to cite a model experiment in which 
nature and nurture appear as the two criteria of classification. On six consecutive occasions 
with a 4-hour interval between any one and its predecessor or successor, a laboratory worker 
makes one determination of the blood calcium level of each of five rabbits, using the same five 
throughout. If we set out the 30 observations (scores) in a 5 (columns) by 6 (rows) table, we 
have to deal with three putative sources of variation : 


(i) a rhythm of variation within the 24-hour period in one and the same animal, its effect 
being therefore such as to increase variation of the row means ; 
(ii) systematic differences of the absolute level between animals at one and the same time, 
their effect being such as to increase variation of the column-means ; 
(111) random errors of measurement sufficient to ensure cell to cell variation in the absence 
of either of the systematic components, and hence also some variation in row- and 
column-means. 


It is admissible to conceive that each cell score in this set-up has three strictly additive 
components, which we shall refer to respectively as the residual, the column factor and 
the row factor. ‘This constitutes a causal assumption. Ex hypothest, the row factor varies from 
row to row, being constant from column to column and the column factor varies from column to 
column being constant from row to row within the sample ; but we are free to postulate random 
distribution of the residual from cell to cell in each dimension of the grid. This is a statistical 
assumption, as is the postulate that there is zero covariance between the residual and the other 
two components.* 

Neither the assumption of additivity nor that of zero covariance is necessarily true of any 
particular situation. ‘They are attractive from a statistical viewpoint, because the variance of 
the distribution of the sum of n variates is the sum of the variance of each, if the covariances are zero. 
This circumstance makes it possible to express the total variance of a system as a sum of additive 
components ; and that indeed is what we mean when we speak of a balance sheet of variance. 


* "The lay-out implies zero covariance of the row and column factors if their effects are strictly additive. 


5 


548 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


Having adopted these postulates with more or less plausibility we are not yet ready to pro- 
ceed. For we have still to make an operational assumption, without which no unique solution 
is possible. Until we have decided within what framework of reference we choose to regard 
our experiment as a random sample, we are not in a position to undertake our analysis. In 
effect, this signifies that we have to find an answer to the question : in what way do we propose 
to repeat the experiment ? One may repeat the experiment last cited in four ways : 


(1) by making n different determinations on each animal at one and the same time ; 
(ii) by making observations at corresponding times in the course of the 24-hour period on 
the same set of rabbits on successive days ; 
(iii) by making corresponding observations on more than one set of rabbits at identical 
times on one and the same day ; 
(iv) by making corresponding observations on different sets of rabbits on successive days. 


Evidently, the only use of an exhaustive balance sheet exhibiting components of variance 
is to prescribe what is likely to happen, if one does the same thing again. Evidently also, the 
sources of variation are not the same in the four ways which one might choose to regard as doing 
the same thing again in this context. The first implies that row and column factors remain 
constant throughout. The second and third respectively imply that the row factor alone or the 
column factor alone vary from one trial to another. ‘The last signifies that both row and column 
factors vary. It leaves us free to postulate that they vary from sample to sample at random, 
and hence to invoke with more or less propriety a distribution law consonant with the possibility 
of assigning confidence limits to the entries of our balance sheet. 


Toss 


SAMPLE - STRATUM LAYOUT 


of 2-fold toss of each Copper i a a 


French h=1 
ot 4 coins, 
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2 being American a 2 French: 
one coin of 


each nationality silver, American h=2 


the other copper. sme xa "d 


Toss 


Fic. 102. Visualisation of a sample stratum for a lay-out involving 3 criteria of classification. 


On this account, Churchill Eisenhart (op. cit.), who distinguishes between (1) and (iv) above 
as Model I and Model II situations, emphasises the distinction between them with particular 
reference to : (a) precautions taken to ensure random sampling in the design of the experiment ; 
(b) whether the end in view is merely to assess the role of error variance or to effect a complete 
partition of the components of variation. Lee Crump (op. czt.), on the other hand, is more explicit 
about what we here regard to be the focal issue, viz. the operational intention. 
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The following assumptions are common to the treatment of the problem in accordance with 
the postulates of either Model I or Model II : 


(1) Three strictly additive components contribute to the sample cell-score x;;,, in 
accordance with the following equation in which e;;,, is the residual, F,.,, the 
column factor and F;, ,, the row factor : 


Xij. s = Eijs T Pis ads cre 
(ii) The covariance of any pair of components in (i.a) is zero, i.e. 


COO Lia eee ee Fi.) = COO (Fr Fy) = 8. 


Column Column 


coi F. F. 
Factor :- Fre Foe Fae Factor: Fics [ra 2108 3.¢s 
. . j . aon . we . . = . = r i = 3 
os Column (i): 1=1 i=2 i=3 ii e i=1 i=2 
Factor Row(j) Factor Row l 
Re J1 | Enstfiet ie | Caste thie | Enst het Fir Firs j=1 Enst Fies + Firs Ezis t Foes tFirs [Earst es * Firs 
s je East Foc + Far | East Fac * Far Fors  J=2 fashion [Casto thn [Ezo Fors 


MODEL I. MODEL II. 
Column Factor Fi constant within column of sample (layer) Column Factor Fjeş constant within column of sample (layer) 


ee 3 variable within column -slab of universe (3 -= dimensionat grid) 
and within column-slab of universe (3-dimensional grid) row-slab and column-=slab distributions identical with one 
another and with that of whole grid 


Row Factor Fj, constant within row of sample (layer) Row Factor Fj, constant within row of sample (layer) 


ane : mee variable within row-slab of universe (3—dimensional grid) 
and within row-slab of universe (3—dimensionol grid) column -slob and row-slab distributions identical with one 
another and with that of whole grid 


Fic. 103. Two sets of assumptions concerning additive score components. 


(ui) If o? = V, is the variance of the distribution of the total score %;;~ in the universe, 


and oz, of, o are the corresponding variances of the distributions of the score com- 
ponents in (i) it follows that 


o? = o? + o? + oè. 
(iv) The residual component e;;, varies from cell to. cell within the row and within the 


column of the sample random-wise, so that the distribution of residual score com- 
ponents is the same in all pillars of the 3-dimensional grid of the complete random 
sample distribution. 

(v) Within the same layer of the 3-dimensional grid F,.., varies from cell to cell only 
within the row, being fixed for the column, and F,. ,, varies from cell to cell only 
within the column, being fixed within the row. 

(vi) Accordingly, the distribution of the column factor in the sample as a whole is identical 
with its row distribution, all rows being alike with reference thereto; and the 


6 
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3 x 3 CLASS UNIVERSE STRATIFIED IN 2 DIMENSIONS 
by addition of fixed Row and Column increments within the layer 


€134* 
Fica + Fra 
MODEL 1. MODEL II. 
Column Factor Fj). constant within column of sample (layer) Column Factor Fj. constant within column of sample (layer) 
ae variable within column -slab of universe (3 -dimensional grid) 

and within column-slab of universe (3 - dimensional grid) row -slab and column -slab distributions identical with one 
another and with that of whole grid 

Row Factor Fir constant within row of sample (layer) Row Factor Fe constant within row of sample (layer) č 

; variable within row-slab of universe (3-dimensional grid) 
and within row-slab of universe @-dimensional grid) column -slab and row-slab distributions identical with 


one. another and with that of whole grid 


Fic. 104. The 3-dimensional universe of sampling for the two model situations of Fig. 103. 


distribution of the row factor in the sample as a whole is identical with its column 
distribution, all columns being in this respect alike. 


The postulates peculiar to Model I are 


(vii) F;.es is fixed for all cells within the same column-slab as well as for all cells within 
the same column of the same layer, and F}. ,, is fixed for all cells within the same 
row-slab as well as for all cells of the same row within the same layer. 


(viii) Hence the variance of the row factors within a column as within a layer is oF and the 
variance of the column factors within a row as within a layer is o4. 


Contrariwise, the postulates of Model II will be that F,. es and F;. rs vary at random from 
layer to layer in the sense that 
(ix) each row-slab and each pillar therein accommodates a complete random distribution 
of column factors identical with the distribution of column factors in the 3-dimensional 
grid as a whole, whence also in virtue of (v) and (vi) identical with the distribution 
of column factors in the column-slab ; j 
(x) each column-slab and each pillar therein accommodates a complete random distribu- 
tion of row factors identical with the distribution of column factors in the whole 
3-dimensional grid, whence likewise in virtue of (v) and (vi) identical with the row- 
slab distribution of row factors. 


In what follows we shall first explore the consequences of Model II. The only new con- 
sideration of moment arising from the foregoing definitions is that the whole rc-fold sample of 
x-scores which supplies us with an rc-fold sample of e-scores is a c-fold sample of row factors 
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on account of the identity of the rows with respect to the latter and an r-fold sample of column 
factors on account of the identity of the columns with respect thereto. We may express this 
otherwise by saying that the sample as a whole furnishes us with no information about the 


column factors other than what we may infer from the composition of any one of the rows alone, 


and no information about the row factors other than what we may infer from any one of the 


columns equally. 


In accordance with Model II postulates, we shall need symbols for the variance of the score 


components as below : 


| Residual Row factor 
Whole sample (layer) . i eae Vi, = V+. os 
Within-row . ; i : | fe Foo 0 
Within-column . ¿PAN Veo es 
From what has been said, the expected values of the above are 
rc — 1 r—1 
F, . 5) a ee oF ELV, . s) ae y o; 
cues 
EWV. n) = 0 EV...) = 
=] — 1 
EV. cs) = : y 0% EV, es) = — o 


We now recall the procedure of which the following is a pattern : 


E, : MV, $ ao = E, : EV, : 3) ag i, ` BAY, . A z j e 
If the components have zero covariance 
Pa ea A a Laer Sees 
v aide we: or Mewes 
Vi.c= iso ee cs* 
Whence we derive : 
Bee - eo as ete 
FE Fr C 
c— 1 c—1 
Es. MV a. +s) = = 
C 
BoM, y= — a7 — a 


If V, has the same meaning as in 13.03 : 
EV) = Ls . MV, š par = E, e MV » i a sist EJE, P 
-> r— 1 -7 
= a = Oe > 


C Y YC 


BY = a 


e° 


EV, : > =n 
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Accordingly, we may define a statistic by the relations 


E(s2) = 02 and s= TA oe ON 
In the notation of (x)-(xiv) in 11.05 
2 Se+S—S.—S, Gii) 
: reee 


Similarly we have 
Le : VM, : se e EV , . s) Ea E, . M(V, - a 


YC 


Accordingly, we define a statistic by the relations 


EX) =«A ro and = : = ¡YM cl . (iv) 
For purposes of computation in accordance with (x)-(xiv) of 11.05 
SS. — S$ 
4) pease C 
| J oes Ecl (v) 
In the same way may we derive the statistic defined by 
Eds) =03 + coz and $= - se ¡ViMo. vi) = | rc. (v1) 
In the sum of the squares notation for computation : 
S,—S 
o 3 oe 
oe ; i ; : . (vu) 


We may thus set out the Model II balance sheet as follows : 


Sums of Squares 
see (x)-(xiv) of 11.05 


rc.VM,. a) = 8, — S c— 1 o? + ro? 


re. V; = S + S—S,—S; (r — 1)X(c — 1) o? 


i 
| 
i 
i 
| 
re .V(Mz. n) =S,—S ae of tan 


By eliminating o? from the first two items we obtain 


Component Unbiased Estimate 
02 S, — S a a 
d (r — 1(c—1) * r(r — 1)X(c— 1)” 
E Sy — S 4 Se — Sa 


(Weel de DT 
S26 Se 
e- tye —1) 
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Let us now examine the consequences of assuming that the column factor is constant from 
sample to sample, i.e. 


EV. ; A una o ia ELV. q ph 
We then obtain 


3 as tecla y a: 
EAF e: D ae — 2 2 ee we 
me | 
ELV 7. at a C o ES A 


This does not affect the derivation of the expected value (s2) of V, as defined by (ii) and (iii), 
nor that of s? as defined by (vi) and (vii); but 


r—l 


r—l 
o +o? — — o — 
y r c y e 


YC 


Es Viv aS 


Sas 
E 


Y 


E{8) = 08 + - de o E I aee 


Similarly, it will not affect the meaning of s? or s? if we postulate that the row factor remains 
constant from sample to sample ; but we then derive 


Es) = o? + 0. E A a R 


If both systematic components remain fixed from sample to sample both (viii) and (ix) 
hold good ; and our balance sheet (Model I) is as below : 


Pe re a AA AA E 


Sums of Squares Divisors Expected value 

| Va, AE et a+ 2 

rc.V(Me. rs) = Sr — S Fwa +3 
E AA 8 — 8..— Se (r — Dc— 1) g? 


Alternatively, we may put it in the form : 


Component Unbiased Estimate 
5 S,—-S  S,—S8, 
5 ge NS re: a Ear 
¿ cd(r—1) * re(r — 1)” 
. ee eae 
r(c—1) * re(e — 1)’ 
A S + Sa — Se — Sr 


(—Ne—1) 
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13.05 THE ADDITIVE PRINCIPLE 


In seeking a rationale for the construction of a balance sheet of variation we have postulated a 
universe of scores with 3 additive components, a column factor, a row factor and a residual. On 
the assumption that there is zero covariance between any pair of them, the true variance of the 
composite score in the universe of choice is the sum of the variances of the three components, 
i.e. o? = o? + o? + 0%. Thus 0% stands for what the total variance would be if there were no 
source of systematic variation associated with the row and column criteria of classification. 
The tidiness of this relation has a deceptive air of finality. So it is important that the student 
should understand what factual assumptions entitle us to construct a balance sheet in accordance 
with the algebraic postulates of 13.04. 

From the factual viewpoint, the pivotal postulate is that the effects of both row and column 
factors are strictly additive. “This signifies that effects of sources of variation associated with 
the two-class systems are such as to change the mean value of the row or column score distribution 
without changing its form or scale. Now it is easy to imagine many other ways in which inter- 
class variation might arise. In much experimental work, change of scale or dispersion without 
change of mean in virtue of the competence of the worker is just as likely an assumption, perhaps 
moreso. Hence the attractiveness of the additive postulate resides less in its relevance to external 
nature than in its convenience to the mathematician. Of this, as of other assumptions com- 
monly made in relation to the same class of procedure, we may cite the comment of Churchill 
Eisenhart (1947): “the only motivation that has been given is the more general nature of the 
inferences that may be drawn . . . when it is satisfied ”. 

In any real situation, it therefore behoves us to ask whether the additive postulate is indeed 
plausible; and it is conspicuously open to question in the field of earliest and most extensive 
applications of variance analysis. Here again the remarks of Churchill Eisenhart (op. cit.) are 
explicit and salutary : 


Hence, when additivity does not prevail we say that there are interactions between row and column 
factors. ‘Thus, in the case of varieties and treatments . . . additivity implies that, under the 
general experimental conditions of the test, the true mean yield of one variety is greater (or less) 
than the true mean yield of another variety by an amount—an additive constant, not a multiplier— 
that is the same for each of the treatments concerned, and, conversely, the true mean yield with 
one treatment is greater (or less) than the true mean yield with another treatment by an amount 
which does not depend upon the variety concerned ; which is exactly what is meant when we say 
that there are no “ interactions ” between varietal and treatment effects. 


Mathematicians who are not conversant with the vagaries of gene exhibition and biologists 
who are not at home with the technical intricacies of the thesis expounded by the writer of the 
remarks cited above will not regard it as unprofitable to pinpoint what is of cardinal importance 
to the present discussion by reference to a naturalistic illustration. Our supposition is that we 
record in 3 different environments the size (yield) attained at a given age by individuals of two 
species of flowering plants, one (A) being calciphil and the other (B) being calciphobe. The 
three environments (treatments) are then the native soil (untreated), native soil with addition of a 
neutral calcium salt and native soil treated with a neutral potassium salt. To drive home the 
point Churchill Eisenhart makes in his reference to treatment (nurture), variety (nature) and 
yield (phenotype), we may disregard the residual component of the cell-score (yield) arising 
from random errors of measurement and uncontrolled subsidiary difference w.r.t. environment. 
If we denote the cell-score in the absence of residual error so defined as u;;, the column (species) 
factors respectively by F, and F,, and the row (treatment) factors as F}, F, and F;, our set-up 
as prescribed by the additive postulate is 
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Species A Species B 
ie ET E E A Us, = (Fy + F,) 
a = u = (Fa + Fr) uas = (Fo + Fs) 


The implications of this become more obvious, if we set the result out thus : 


Species A Species B 
Effect of Ca . . (tig — 411) = (F — F 1) = (sg — Ug) 
Effect of K . . (uz — 411) = (F; — F,) = (tes — ua) 


The above schema signifies that a fixed excess of Ca increases the size of B and A by an equal 
amount, a statement which is inconsistent with our own initial assumption that the two species 
are respectively calciphobe and calciphil. Likewise the additive postulate signifies in this 
context that B and A react by equal responses to a fixed increment of K, an assertion inconsistent 
with general experience of ionic antagonisms in the biological domain. On the contrary, we 
should expect a calciphil species, which reacts by increased yield to increase of Ca soil content, 
to react by diminished yield to excess of K, and a calciphobe species which reacts by diminished 
yield to increase of the Ca soil content, to react by increased yield to excess of K. 

From the field of interspecific variation, it would be possible to cite numerous examples 
of comparable situations, but the writer has sufficiently emphasised their occurrence within the 
domain of intraspecific variation. More recently Haldane* has classified known types of 
interaction, i.e. departure from the assumption of additivity, by recourse to experimental data. 
It is indeed open to question whether there exists any nature-nurture situation about which we 
can make any such assumptions with confidence in the absence of corroboration, or whether it 
will often happen that such an assumption is plausible. What is certain, as illustrated by the 
foregoing example, is likewise embodied in the adage that one man’s meat is another man’s poison. 
Many situations arise in which stimulus X increases response of genotype A and inhibits that of 
genotype B, while stimulus Y decreases response of genotype A and augments that of genotype 
B. That it is necessary to remind the biologist familiar with his materials that the additive 
postulate may therefore be grossly inapplicable to a set-up in which the two criteria classification 
are nature and nurture, is because relatively few biologists who invoke the technique under dis- 
cussion with a view to the construction of a balance sheet exhibiting the respective contributions 
of nature and nurture variables realise that the additive postulate is in fact the keystone of the 
entire edifice. 

Accordingly, we may thus sum up the outcome of our enquiry at this stage : 


(1) The possibility of constructing a true bill which sets out what fractions of total population 
variance are respectively attributable to one or other source of variation specified by 
the class criteria and to residual errors of observation or other uncontrolled circumstances 
presupposes the truth of the postulate that the components are additive ; 


(2) From inspection of the data of a single small-scale experiment it is never possible to 
infer with certainty that this postulate is valid, and there will arise many situations in 
which it is grossly incorrect. 

* J, B. S. Haldane (1946), ““ The Interaction of Nature and Nurture”. Ann. Eugen. 13, 197, 


os 
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These considerations prompt us to ask: is it possible to justify the additive postulate by 
recourse to observation, and if so, how ? To answer this, we shall suppose that the joint contribu- 
tion of F;.,, and F;,,, to the score value exceeds or falls short of their sum by an amount 
Fij. ¿ which varies from cell to cell, i.e. : 


Xij. s = li. H Fico +H Fi. rs + Fi. yo 


Evidently the expected value of sê in 13.04 will not be o2 unless F;;., =, and the design of 
an experiment involving single score values for each cell in a 2-way lay-out provides no occasion 
for distinguishing between two components which both vary from cell to cell. On the other 
hand, their effects are distinguishable if we resort to n-fold replication, i.e. n-fold repetition of 
each observation without changing the row and column sources of variation. In such an experi- 
mental design, we conceive our sample as a stratum of n layers. The residuals vary random-wise 
from cell to cell within a layer and from layer to layer within a pillar. If the replication is 
faithful, the component F,;., varies from cell to cell within a layer but not within a pillar. 
Accordingly, we can ask whether the measures of cell-to-cell variation within a pillar and 
within a stratum are consistent, i.e. if F;;., = 0, in a set-up for 3 criteria of classification 
involving replication as the new one. 

We shall examine this issue in 13.06. Here it suffices to point out that the construction of 
a balance sheet for an experiment involving replication is valid only if : (a) the analysis fails 
to disclose a new component of variation ; (b) we have other reasons for assuming that the 
3 systematic components conform to the postulates of additivity and zero covariance. If the 
results of identical replication do not confirm the assumption that the postulate is valid, the 
inclusion of a separate component of interaction as defined by Churchill Eisenhart in the 
balance sheet of causality merely serves to announce that the procedure for constructing it is 
defective, hence also that it is not a true bill. 

Before we examine the credentials of the balance sheet for a replication experiment, it is 
fitting to examine what we may rightly infer, if there is indeed good reason to believe that the 
additive principle holds good. We may then interpret our balance sheet as 


(1) a recipe for assigning to what errors mean measurements are subject when we exclude 
one or other source of variation ; 

(ii) an overall picture of how much variability remains when we eliminate one or other 
source. 


To clarify the meaning of (i), the illustrative experiment already cited will serve our purpose. 
At a given time of day, the data supply us with a mean figure for the blood calcium level of 
different rabbits. This figure is therefore subject both to residual sampling error inherent in 
the method of measurement and to variation arising from the fact that different measurements 
refer to 5 different individuals. The unbiased estimate of the residual variance being s?, that of 
the mean of a 5-fold sample is ¿(s?) in virtue of (vii) in 7.03 of Vol. I. Alternatively, we may ask 
what would be the sampling variance for the mean of the series of 6 determinations on the same 
rabbit at different times of day or night, i.e. to what sampling variance our column means refer- 
able to the same rabbit are subject as the result of errors of measurement alone. In this case, 
our concern is with the mean of a 6-fold sample, and the required parameter is 4(s”). In general, 
we may say that the mean row-scores and the mean column-scores are respectively subject to 
sampling variance of (s? + c) and (s? +r). In the writer's view, this is the most fruitful use of 
the procedures subsumed by the term analysis of variance, if only because it operates within 
the domain of estimation and therefore entails none of the debatable issues raised by recent 
controversy concerning decisions made within the framework of a unique null hypothesis. 
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An alternative conception of the sort of questions an accredited Balance Sheet of Variance 
may answer brings into focus an important difference between Model I and Model IT of 13.04. 
As an assemblage of unbiased estimates of universe components of variance, the balance sheet 
would appear to be just as valid, if constructed on one or other assumption ; but we may wish 
to take the further step of placing confidence limits (see 16.03 below) around each of our estimates 
of the components. ‘To do so, we must then invoke certain assumptions concerning the dis- 
tribution of the row and column factors. If we scrutinise our experimental data through the 
spectacles of Model II, we are free to postulate with more or less justification a normal distribution 
of all three-score components in the universe as a whole. Thereafter the problem stated is purely 
mathematical, if we are entitled to regard the choice of sample as random. 

American writers on analysis of variance are not slow to stress the fact that random choice 
of column- or row-score components is often inconsistent with experimental design, as in the 
following remarks of Churchill Eisenhart : 


cc 


. when an experimenter selects two or more treatments, or two or more varieties, for testing, 
he rarely, if ever, draws them at random from a population of possible treatments or varieties ; 
he selects those that he believes are most promising. Accordingly Model I is generally appropriate 
where treatment, or variety comparisons are involved. On the other hand, when an experimenter 
selects a sample of animals from a herd or a species, for a study of the effects of various treatments, 
he can insure that they are a random sample from the herd, by introducing randomization into 
the sampling procedure, for example, by using a table of random numbers. But he may consider 
such a sample to be a random sample from the species, only by making the assumption that the herd 
itself is a random sample from the species. In such a case, if several herds (from the same species) 
are involved, Model II would clearly be appropriate with respect to the variation among the animals 
from each of the respective herds, and might be appropriate with respect to the variation of the herds 
from one another.” 


Lee Crump (1946) writes in the same vein : 


“ A Note of Warning. It must be remembered that in using the analysis of variance to estimate 
variance components, we have assumed the elements of the fundamental equation to be randomly 
selected from an infinite population. In an experiment where three widths of spacing some crop 

are purposely selected for trial, it is not reasonable to regard these widths as random samples from 

all possible widths. On the other hand the blocks in a field experiment may sometimes quite 
reasonably be regarded as a random sample of all such blocks. In sampling production from, say, 

three machines in a factory, where these machines constitute all the machines which the factory has 

or is likely to have, it is more reasonable to regard these machines as the whole of a finite population 

than to consider them as random samples from some infinite population. If the factory owner is 
sampling production with a view to purchasing more machines of the same type, the three machines 

may be appropriately regarded as samples of the infinite population made up of all machines of 

the same type.” 


The more reasonable attitude to the three machines as the whole of a finite population is 
in fact a Model I view of the situation ; but the same example also brings into focus a semantic 
difficulty which besets justification of the alternative view. Indeed, the foregoing remarks of 
Churchill Eisenhart resolve only part of the difficulty of justifying the assumption that choice 
of classificatory variables is truly random. To be sure, we can choose cows of a particular herd at 
random, but if we do, our assessment legitimately refers only to that herd. To extend it justifiably 
to others of the same breed, we have to invoke the additional assumption that the range of intra- 
specific variation does not materially differ from herd to herd ; and to say that this assumption 
is itself unjustifiable deprives the assessment of public utility. In the nature of things, random 
choice of fertilisers of all possible chemicals curiosity or perversity may prompt the investigator 
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to add to the soil is a concept devoid of operational meaning ; and random choice of varieties 
within a species or of individuals within a variety is a concept we can justify without recourse 
to a God's eye view of the universe only as a description of a local set-up. For a truly random 
choice of rabbit varieties in Kent would not be a truly random choice of rabbits in Kentucky. 

T. H. Huxley remarked rightly that mathematics is a mill which cannot grind out ingredients 
other than those put into it. What is true of any statistical technique is therefore true of analysis 
of variance, and especially so. For statisticians of an earlier vintage never identified their terms 
of reference with so ambitious a title as the design of experiments. No statistical technique is an 
adequate substitute for common sense, alertness to the nature of the problem on the part of those 
who ought to be clear about it or for ingenuity directed to the removal of irrelevant variables in 
an experimental set-up. Indeed, it is well to remind ourselves that experimental science, in 
its assault on problems most successfully attacked by experimental methods to date, had advanced 
far towards its present stature without recourse to statistical principles of any sort. 

It is therefore necessary to insist that analysis of variance—like any other sort of statistical 
procedure—has a limited sphere of usefulness, especially because its legitimate uses are at present 
difficult to assess against a background of novel logical premises and, for most of us, unfamiliar 
mathematical procedures. In the situation we have used as an illustration of a 2-way classifica- 
tory set-up, our assumption has been that the investigator wishes to ascertain with as great 
economy as possible whether the blood calcium level of rabbits is or is not subject to a diurnal 
rhythm, i.e. a rise and fall within a 24-hour period. A balance sheet of variance which exhibits 
a significantly large component w.r.t. observations on different animals at different times of the 
day does not in fact answer the question last stated, such a result being consistent with a quite 
erratic fluctuation during a particular time interval such as 24 hours. In so far as the analysis 
bears on the problem stated, it is helpful because we can state to how much sampling variance our 
mean values for determination at different times of day are subject when we have eliminated 
all sources of individual variation. Hence we can see whether there is a consistent trend of 
our mean values throughout a 24-hour period without recourse to the more homely custom of 
repeated experiments of the same sort. In such a situation, the experimentalist is entitled to 
prefer the assurance of a consistent answer by recourse to laboratory experience of several days 
duration to the consolations of mathematics ; but there may well arise situations in the practice 
of industry or in sociological enquiry such as to commend a first approach which is more 
economical. 

With full recognition of the existence of situations in which an economical preview is 
indeed advantageous, it remains none the less true that no statistical procedure can rightly claim 
to provide a rationale for the design of experiments regardless of the end in view ; and a widely 
quoted illustration of the use of analysis of variance in particular is instructive as a warning 
against any such mechanical view of the value of statistical methods. In an early issue of 
Biometrika, Oswald Latter (1902) published the result of measuring the length of 1572 eggs of 
the cuckoo including 264 assignable to known foster parents of 6 different species. ‘The odds 
are in fact about 100: 1 that variation between nests of one or other type is not wholly explicable 
in terms of variation within nests. Since 1902 the same set of figures has passed from one 
textbook to another to illustrate one or other statistical technique fashionable at the time, latterly 
as an illustration of homogeneity tests w.r.t. a one-way classification involving unequal sample 
numbers as in 13.07. Indeed, the writer of a comparatively recent book on statistics for socio- 
logists introduces the topic with the complaint that “it is a considerable jump from lengths 
of eggs in a cuckoo’s nest . . . to sociological problems ”.* 


* Margaret Jarman Hagood : Statistics for Sociologists (1941). 
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The slip-up in the last sentence cited is pardonable, since sociologists are under no obligation 
to know that cuckoos do not have nests, unless current texts which trade in this exhibit disclose 
what is the end in view. On the other hand, it would be difficult to offer the naturalist an example 
of the use of statistics less calculated to inspire confidence. Similarities with respect both to 
colour and form between the cuckoo’s egg and those of the foster parent had long been a matter 
of comment and discussion among enthusiastic bird watchers and egg collectors. It was also 
well known that cuckoo eggs have a character peculiar to the locality where found, as discussed 
by Eugene Rey (1892) in his book Old and New information concerning the domestic economy of 
the Cuckoo. One may presume that Latter, himself a first-rate naturalist, knew all this when he 
chose the topic ; and one may be confident that he would have been able to throw further light 
on it if the hypnotic influence of Pearson’s apotheosis of measurement as an end in itself had not 
enlisted his industry in an undertaking unlikely to add anything to what was already common- 
place among bird watchers. 

Indeed, it is scarcely too much to say that no author who uses Latter’s data as exemplary 
material has been able to convince the reader that the outcome of the ensuing arithmetical 
exploits has greatly advanced biological knowledge. It is also safe to say that it put at the 
disposal of those who have later clarified the enigma no helpful clue for their use or guidance. 
The facts, disclosed by E. P. Chance (The Truth about the Cuckoo, 1940) as we now know them, 
are the result of painstaking observations on the behaviour of individual cuckoos during the same 
and successive seasons. Briefly they are as follows. The same cuckoo returns in successive years 
to the same territory and almost invariably lays its eggs in the nest of a particular species. All 
the available evidence points to the conclusion that a cuckoo reared in a particular territory 
mates with another cuckoo reared in the same territory. In short, cuckoos are divisible into 
local sub-species each with its dominant foster-parent type, and selection has presumably 
ensured the survival of genotypes most fitted to lay eggs acceptable to the latter. 

Thus the truth about the cuckoo as it here concerns us is that a much-publicised statistical 
enquiry did not in fact draw attention to a new problem, and it did little if anything to clarify 
one which field naturalists already recognised as such. It is not easy to see how it would have 
been possible to elicit the relevant facts by methods other than intensive work of field observers, 
for the most part allergic to statistics of any sort. Statisticians who wish to enlist greater respect 
for the proper uses of statistics would therefore be wise to refrain from further comment on the 
cuckoo question when their aim is to show how statistics can help the field worker. 

One caveat it is still necessary to emphasise in this context concerns a class of judgments 
common to many situations involving multiple, as opposed to binary, classification. So long 
as our preoccupation is with only two classes the issue of homogeneity is straightforward. Either 
the statistical data referable to the two samples are indicative of a real difference or they are not. 
When we turn our attention to a system of more than 2 classes the assertion that there exists a 
real inter-class difference may signify at opposite extremes: (i) a graded effect distinguishing 
any one class from every other ; (ii) a clear-cut threshold response which may differentiate only 
one class from any other. We meet with clear-cut threshold effects very commonly in biological 
enquiry ; and we have no reason to disregard the possibility of doing so in social science. Where 
this is so, a multiple classification of the data may conceal or obscure a real difference which 
two-fold division at an appropriate level would bring sharply into focus. In the last resort, 
any statistical technique referable to a system of many classes will be more or less useful to 
the extent that the investigator exercises good judgment of his materials in the initial task of 
classifying them. l 
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13.06 BALANCE SHEET FOR THREE CRITERIA 


An analysis involving three criteria of classification may be replicative or complete. The first, 
- sometimes referred to as incomplete 3-factor analysis, signifies that the third criterion of classifica- 
tion is simply repetition, as when we have n score values for each of rc cells in a lay-out involving 
two specific taxonomical categories. The second, referred to as complete three-factor analysis 
(without interaction), signifies that each of the n observations constitutes a member of a class, 
as in the coin model of Fig. 102. 

To clarify the replicative case we may imagine an experimental design of the following type. 
The scores a,; and b;; respectively refer to the red cell count of different blood samples from one 
and the same female rabbit at one and the same time of day : 


Rabbit I Rabbit II 


12 noon tu: Du azı 5 bzi 


12 midnight Gt A azs 3 Doe 


In this set-up we have initially 2 specific criteria of classification: type of individual 
(columns) and time of day (rows). Precise repetition should lead to consistent estimates both of 
the error variance and that of the putatively additive row and column factors, if there is indeed 
zero covariance between the row and column factors, though a cell factor indistinguishable 
from the residual variance in a single trial would be separable in a repetition involving no new 
source of systematic variation. In that event, we should be able to distinguish from true error, 
which varies from cell to cell in any one experiment and from cell to corresponding cell in suc- 
cessive experiments, a component which varies only from cell to cell in any single experiment 
being constant in corresponding cells of successive experiments. ‘The words in italics are 
the operative phrase in the sentence above. In addition to residual sources of variation arising 
from errors in connexion with each of the 8 counts involved, we may conceive a systematic 
source of error introduced by defective procedure, e.g. the use of a separate syringe needle for 
each rabbit at each time of day. We may then speak of a cell factor F;;. , which varies from 
cell to cell like the residual e;;., of 13.04 but is constant within the cell. To represent this 
conception we need to label a third (within-cell) dimension (k = 1,2 . . . n) of variation and 
our score components as follows : 


Column Factor ee ee Row Factor . e TF 
Cell Factor . ee: E Residual : Wigs ae 


We may then visualise the foregoing lay-out as below : 


da =61m.. + Py. + Py. os HPs. pr da = €101.8 + Poy. + Po. cs + Fi. es 


by, =. Oay3 gh Bagg Pd ert Pin ce boy == ayy. gp bP gy og + Fo. + Fi. es 
Qs = €112.8 + Ru. + Pi. t Pe. rs Gag = 6 2 F Puo Pos + Ps ei 


bin = €912. s + Fi. : 4 Doo = €292 .8 + Foo. s + Fo. 


From a formal point of view, the outcome of the analysis will be the same whether 
we interpret F;;.s as: (a) an unsuspected independent systematic source of variation, 
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or (b) a covariance term which betokens interaction between row and column factors. 
If we conceive our sample as a 3-dimensional stratum of two layers, the systematic cell 
factor is a constant of the stratum-pillar, and it is important to notice that the order of the cell 
entries in this set-up is irrelevant. This is not so if we design a comparable experiment to throw 
light on the possibility that sex, breed and time of day are all significant sources of variation, as 
below : 


Angora Flemish 
12 noon Male 1 Female 1 Male 2 Female 2 


12 midnight Male 1 Female 1 Male 2 Female 2 | 


Here we are dealing with a third criterion of classification which is specific, and we may then 
postulate a factor (F;. ns) definitive of the layer as in Fig. 102, and constant from layer to layer 
within the stratum of the 3-dimensional sample grid. If there is no interaction between the 
three specific factors, our system of score components is therefore : 


411 = 8111. s + Fy a pce E T figs = ta a ELA T n T Ps 


bi == a ts vis bax tee gt ego EP 
any = €112.8 + Fyies + Po. rs + Fins das = Crag. a + Faso t+ Po. rs + Fins 


bio =a MAA Ps th 2's ns bee == tt 24a Fe ar Fa 


If we postulate strict additivity our score equations will be 


(a) Incomplete 3-factor analysts (2 factors with replication) : 


ee A Fat Pat: z , : S (1) 
(b) Complete 3-factor analysis without interaction : 
Mia a. hice + Ej. e] Paine - . . + 0] 


For the 3-dimensional sample of ncr score values the following table gives the number of 
sub-sample values of each component : 


dl Column Row Layer Cell 
Factor Factor Factor Factor 
Whole grid . i ner c Y | n cr 
Column-slab . : nr 1 r n r 
Row-slab E oye c 1 n 
Layer . A cr c r 1 cr 
Pillar. E i n 1 1 n 1 


In conformity with our previous symbolism, we shall use o%, to signify the true variance of 
the cell-factor distribution and oĉ to signify the true variance of the layer-factor. We may then 
specify the expected values of the sample components as follows in accordance with the postulate 


of Model II in 13.04: 
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Column Row Layer Cell 
Residual Factor Factor Factor Factor 
y me gmk rls ee mora 
pat 3 : e c T n er 
ncr E r n cr 
nr — 1 AS | n— | re 
| em a e 0 a oz a 
nr r n r 
ne — 1 . AT #— 1 c— ] 
ee : . —-—o o? 0 oi oe 
nc c n č 
E ES e=1., Pete OG Bw 
PRA ; 0% 0% d% 0 o 
cr cr 
n— 1 ee | 


We are entitled to interpret the 3-way complete analysis without interaction in terms of 


E ri Sead ee 
o j” O; 


A A > C s 
Model I, in which case oĉ, o and o2 respectively replace o, in the 


foregoing table. We cannot interpret the expected value of the cell-factor variance within the 
column-slab or within the row-slab in terms of o2, if we adopt the same postulate. Accordingly, 
we shall restrict ourselves in what follows to the alternative assumption. 

Without examining the implications of Model II at this stage we shall now explore some 
of the consequences of the additive principle. Two statistics are common to both cases of 
3-factor analysis specified above, víz. the variances of the row- and column-means : 


E, . Vid: a aoe Op SAN s) EE EMV 7. a ; 
F, . VM, sl = ELV e a) T5 EME.: ve 


Incomplete 3-factor Analysis : 


If we proceed in accordance with (i) on the assumption that our third source of variation is 
peculiar to the cell, our table gives 


er — 1 
A 2 = E a a Ss z ls ne Ls ies 
ncr Så 
om Le 2 eine 
ny Y 
~B uM y A A A 
ncr c 
Similarly 
Soot E A oe 
[pe 0) age ee eae S| a. . ee 
ncr CF 


The variance within the pillar depends only on the residual, as is evident from the table, since 
the layer factor is zero in this context, i.e. 


E, . MV a. de =a 


o ; i i . (v) 
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We now recall the statistic V ¿, defined by (xxv) of 11.05, viz. : 
is = | q VM, : R a V(M,. a oe MV , ; R 


Whence from (iii)-(v) : 


Accordingly, we may define the following statistics : 
Ee e Taw no if si=nr.V(M....) ~(e— 1); 
EA 4 fo, 4+-no if $ =nc .VMo..)=(r—1); 


Eset? + no if s, = ncr. Vi, > (r — Ie — 1); 
ENS) = oa. if s*= mer. M(V o. ors) = cr(n— 1). 


For purposes of computation, we may set out the foregoing results in accordance with the 
schema of (xxv)-(xxx1) in 11.05: 


A E 


Mean Sums of Squares Divisor Estimate Expected Value 
Se =S c=1 s3 o? + noè, + nro? 
S, — S r— l sA 0? + noè + nco? 
S qe Ser E Se ot Sr (r pS I)(c Tp 1) San | o; + Noe, 
a: =a re(n — 1) xe o? 


Complete 3-factor Analysis without interaction: 
If we proceed in accordance with (11) above, we have 


E, . VM.. a = Ero. a os EM(V , : a 


By TE O a oe Toe 
ncr c r 
pe. 2 EAT | 
nr e y r n n? 
c—1 c— 1] 
c Es. VMs ce) = —— o ga i : : : . (vii) 
( ) a ee Oe (vii) 
In the same way we derive 
— 1 — 1 
A E A cos eas eg 
ncr r 
pe, NORESTE : 
Es V(Mo. ns) = ——0% + A A > ; » fae 
ncr n 


Hitherto, we have based one item of our balance sheet on the difference between the total variance 
and the parameters used to evaluate all the remaining estimates. So we now define 


V, ee Vo VU. oes) = KIM +s) — VM il 


564 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 
Whence from (vii)-(ix) : 
ncr — | c—] r—1 #— l 


A E a ES a 
(Vz) neor ° ner ° ner ° ncr 


MEA e A E 
= ——_a— : i : X 
A 0 (x) 


For purposes of computation, we may set out the results implicit in (vi1)-(x) as follows : 


Mean Sums of Squares Divisor Estimate Expected Value 
Se — S c— 1 3 o2 + nro? 
S,— S | r=l = ad + nco? 
Sn — S n— 1 | s a? + cro? 


Sea — Se — Sr — Sn + 28 ner =c=r=n>+2 s3, a | 


As a clue to what follows, note that in this table, as in its predecessors and as in 13.04, the 
total of the first column is S, — S, the sum of square deviations from the grand mean of the 
sample. 

Meaning of Interaction. ‘The assumption of (i) is that we have 3 strictly additive factors 
of which the third is referable to circumstances inherent in replication. Exact replication is 
not always possible, never perhaps in sociological enquiry. It is presumptively realisable in 
laboratory practice with due precautions against introducing a new systematic source of error ; 
but the balance sheet given above is meaningful only on the assumption that an independent 
and strictly additive source of error peculiar to each n-fold set of cell-scores contributes to the 
score value. ‘This assumption is highly arbitrary ; but our examination of its implications is 
not valueless on that account. ‘The interest of the procedure resides in what it can tell us about 
the relevance of the additive postulate. Any departure from additivity will appear-as a cell to 
cell variable within the layer, but constant within the pillar. Hence an estimate (s?) of the residual 
based on the square deviations from the pillar-means will not be affected thereby. On the other 
hand the statistic (så) which involves deviations from row-, column- and pillar-means will take 
into account any cell-to-cell variation other than that due to the distribution of the residual 
score component. Hence estimates of the universe variance based on these two statistics will 
be consistent, only if the additive principle is valid and no systematic source of cell-to-cell varia- 
tion arises from faulty experimental design. 

If the result of a significance test to assess the consistency of the two estimates is to encourage 
the belief that there is in fact a source of variation peculiar to the z scores of the pillar within 
the sample stratum, we may interpret the result in one of two ways. If convinced that the 
replication introduces no source of systematic error, we conclude that there is interaction between 
row and column factors, i.e. that the additive principle does not hold good. To record the cell- 
to-cell component of variance in a balance sheet as interaction is then misleading, since zero 
interaction is inherent in the rationale of the balance sheet. Alternatively, the replication may 
involve an unrecognized specific source of systematic cell-to-cell variation. It is then possible 
that the balance sheet is valid, but the data of the experiment do not suffice to justify the assertion 
that it is. 

On the other hand, the data of an experiment amenable to 3-way classification do permit 
analysis with a view to deciding whether the 3 specific criteria refer to additive components. 
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For we can postulate 3 systematic cell effects in addition and test the hypothesis that each is 
negligible. For simplicity let us assume that we have other grounds for believing that there 
is no interaction between F,.,, and either F;. esor F;. ,s. Our concern is then with the possi- 
bility of interaction between F;.,, and F;.,,. Accordingly we postulate : 


Nass. Ori. tic os Fs. ret Pans + Eu... 


By the foregoing procedure we should then derive 


E O AA A Toe. 
ncr C3 ee r n cr 
a. AAA 
ncr cr c 
eu a e go ai 
—] — 1 
EV to —Á 2 ; 


ncr n 


r- De), 


cr* 


E, . Ea es E E De = Di Sig 


ncr CI 


We may define a statistic whose expected value involves of alone by 


LP she MIP ava) > MV z. kia oo Fy: 89 


ae Ml er > a 
ncr 
Our balance sheet then takes the form 
Mean Sums of Squares Divisor Expected Value 
S,—S c— 1 0? + noz. + nro? 
S, — S r—1 0? + no? + nco? 
Sn — S n—1 o? + cro? 
S+ Ser — So — S, (r — 1)(e — 1) o2 + nos, 
S + Sy — Sn — Ser (cr — 1)(n — 1) o” 


As stated, we are entitled to regard this as a balance sheet only if all 4 systematic factors 
are additive. We have no means of knowing this from the data of the experiment; but our 
concern is not to assess the contribution of a so-called factor of interaction. We wish to know 
whether there is a cell-effect which may or may not be indicative of interaction ; and the fore- 
going schema shows us which statistics (viz. the last two) must be consistent if there is no source 
of variation from cell to cell within the layer other than the residual. 

Exhaustive 3-factor Analysis. The last analysis is artificial inasmuch as we assume the 
knowledge that interaction between the layer-factor and the row-factor or column-factor is 
negligible, and its use is merely to clarify an exhaustive 3-factor analysis the aim of which is 
both to assess and validify the balance sheet for 3 specific criteria. Validification signifies that 
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we must interpret the data of the experiment with a view to demonstrating zero interaction at 
every level, i.e. between row and layer factors, hetween column and layer factors and (as above) 
between row and column factors. Accordingly,.we postulate two hypothetical cell factors in 
addition to F,;.,. The latter is constant within the pillar of n-cells but varies from cell to cell 
within the layer of cr cells and is indistinguishable from the effect of interaction between F,. es 
and F;.,,. Similarly, we postulate F,,¿. as constant within the column of r cells but variable 
within the row-slab of nc cells; and F,;., as constant within the row of c cells but variable 
within the column-slab of nr cells. We denote the true variance of the three cell factors as 
o? referable to F;;. as before, 0%, referable to F,;., and o%, referable to F;;.,. In conformity 
with the Model II postulates, relevant data for the construction of the balance sheet involving 
the 3 specific and 3 cell factors are then as in Table 5. 


TABLE 5 


Expected Values of Components of Variance 


Chij es Pia oe Pi Pu Fia Pi 
at a ipa A Pek A tic = 38 nr — 1 
l: <2 . ° Oe Oc Or On Oer Onc ik, 
ncr C r n cr nc nr 
nr — 1 — ] n— | — 1 n — nr — 1 
2 2 2 2 2 
Vy . CS nr a? 0 Or n On y Ocr One nr Car 
nc — 1 c— 1 n — 1 c — 1 nc — 1 n- 1 
2 2 2 2 2 
V, rs o? Og 0 On Oer Onc Onr 
nc c nc 
pma g= eae tad e rs 
Vs ens ° e Oe Or 0 — Oer One Cnr 
rc c r c r 
n— 1 n— 1 — n — 1 
2 2 2 
Ve arcs . o? 0 0 on 0 Onc Onr 
n n 
r— 1 r—l r—l r—l 
2 2 2 
Fa e nes s o? 0 T 0 Oer 0 Onr 
c— 1 c—l c—l c— l 
2 2 2 2 
V; e nrs Oe Oc 0 0 pS Ccr A One 0 


We may surmise from what has gone before that 


(a) expected values of V(M,.¢s), V(Ms. rs) and V(M,.n;) will each involve the residual, 
one specific factor and one or more cell factors ; 


(b) expected values of V, and analogous statistics (V,, and V,,) will involve only the 
residual and cell factors ; 


(c) a statistic whose expected value depends on the residual alone is obtainable by sub- 
tracting all the foregoing from the total sample variance (V ,,. ;) 


Accordingly, we define by analogy 
Va mg MV cis) M(V 2. al ps ME ar dei Va: s3 
or ses P aa M(V z. e) Mess WAY & aes) as ER se 


ASSUMPTIONS UNDERLYING ANALYSIS AND SYNTHESIS OF VARIANCE 567 


EXHAUSTIVE 3-FACTOR ANALYSIS 


Complete Grid (2x2 x2) 


Cell Components 


Fic. 105. 'The 3-factor Pattern. 


Whence we derive from Table 5 the following : 


ig ee ee Ble 
ncr E r n 
oa ed m a. 
CY nc nr ` 
a y re 
ncr nc 
o AE a R 
ncr T Y 
= = E ie 
pee ee ee 
mner n nc nr 
A Dea a 
ncr Cr 
2 yv, e 6 MD. 
ncr nr 
gy Elo. ee ee 


ncr nc 
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The expected value of the difference between the last six of the above and the first is 
Finks so o: ad VM.. se RF V(M y: e S VGN. aia = i os se Fa ca a 
Whence we derive 
ESE (n — Dc — 1)(r — da 


ncr 


COMPUTATION FOR EXHAUSTIVE 3-FACTOR ANALYSIS 


ie] i=2 TOTAL SUM OF SQUARES 


ner. Vgs = Sq - S 


ner. V(Myes) =Se-S ner M(Vy ers) = Sq Z Ser 
ner. V(Myrs) =Sp-S ner M (Venas) = Sq 7 Sne 
ner. V(Mx ns) = Sn7S ner M(Vxnrs) = Sq 7 Snr 


SQUARE TOTALS (1,2) 


iz1 i=2 
a — hes y xaf 


D 
GRAND TOTAL rc.Sa = 1 


O A * Xz tX) 


is is2 TOTAL 
EUA 
soa 
q 
GRO TOTAL 


tt * tan? 
fe er 2 
CSne= 2 i Thi 


Fic. 106. Computing Schema for the 3-factor Pattern. 


h=1 
T= (2 txt Td" * X222)" 


GRAND TOTAL 


jet 


j=2 


For computation we may reconstruct the grid of r rows, c columns and 7 cell entries: 
(a) with r rows, n columns and c cell entries with cell totals Ta; ; (b) with n rows c columns and 
r cell entries with cell totals Tp; We then define by analogy with S., of (xxix) in 11.05: 


kn jar ] +=" i= 
Sep > ` Th; Dno = - > > Ti 
CEN fond T p=1 ¿=1 


Our complete specification of the 3-factor set-up is then : 


TABLE 6 
Mean Sums of Squares Divisor Expected Value 

Se — S c—1 o + nro? + no, + rože 
S,- 8 r=1l o2 + nco? + no, + co?, 
Sn — S n— 1 o2 + cro? + ro?, + co?, 

S + Sie — 8,.— S, (r — Dc — 1) a + m2, 

Pe Bno za oe a So (n oy Dc 70 1) g? +F roda 

S - Ser On — Sp (n — 1\(r — 1) o + co?, 


SoS +B, Be ee ee = | (n= Te = DO = 1) o? 
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The last 4 entries of the foregoing table disclose which statistics must be consistent if 
o2, 02, and o2, are negligible. If they are so, we are entitled to assume that the specific com- 
ponents are additive. If so, and if we have also good reason to believe that the particular 
Model II postulates hold good, we may construct a balance sheet for the 4 remaining sources 
of variation in accordance with the foregoing prescription for 3-factor analysis without interaction. 


13.07 ONE CRITERION OF CLASSIFICATION 


In Chapter 7 of Vol. I we have seen that the sample mean score from a normal universe is normally 
distributed, as is the difference between two sample mean scores. On that basis, we can 
approximately assess the significance of a group mean difference. We can however formulate the 
null hypothesis in a different way. Given c groups of r scores, we may ask whether the assemblage 
of samples is homogeneous w.r.t. the column criterion of classification, i.e. that the column 
mean differences arise only from residual sources of variation common to all. The problem 
of assessing the significance of a group mean difference is the particular case, when c = 2. 

If we lay out in 5 columns the heights of 57 children of one sex in 5 equal age groups regard- 
less of any peculiarities of the r score values of an age-group inter se, our concern is with only one 
criterion (age) of classification. On the assumption that the universe is homogeneous w.r.t, 
the column criterion 


et nay.) oe. 


E{V : J ne 


Esa 
o?, 


ENM: a = EV. 2 at oe MV z. 03) ae 


TC 


We may define as before a statistic whose expected value is o? by the relations 


Es) =0* and & = —- Vie. es) 


‘= r(M, — My 
2 i : 
A a l 
fat Sere © 
Likewise we may define a second statistic whose mean value is o? by the relations 
Y 
ELJ =F ane 2 = ; MV a. wa 


o O 


When our concern is with only one criterion of classification, we can develop criteria of 
homogeneity without imposing the restriction that the groups are all of one size. If not, we must 
interpret the operation E, for extracting the mean score or mean square score derivation within 
the column and Æ, that of extracting a mean column parameter with due regard to the weight. 
If there are in all n scores in the c columns, and r, scores in the 7th column, we therefore write 


im ees, ¡in 
n= 276 8 e pa E ie A ae 
L= @jg=1 i= 


As before, for the whole sample 


eee | 
o°. 


BV e R S 


n 
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For the variance within the zth column, we have 


Elba id £ > se ; 


1 


ESMERO DA e 
E a ES? — te 
In these expressions pet 
Sa —l)=n-e, 
$=1 
EMO STE SR E EVO Se He 


Accordingly we define s? and s¿ by the relations 


n 
A A <5, E 


o) 


Es) =0*: and $ = ——M(V,. 0s) 


Mo = 5 


(iii) 


(iv) 


(v) 


(vi) 


(vii) 


Evidently, the above are equivalent respectively to (i) and (ii) when the number of items is the 
same in all classes, so that n = rc. We must of course interpret V(M,..,) and M(V 7. es) as 


weighted mean values, 1.e. 


t=0C i=ci=? 
E A : > r({M;— MY? and M(Vz.¢s) = F > & — My. 
tal N i=1 j=1 
Thus we have 
i= 1 i=c J=% 


S Y (Mp 


| (viii) 


When c = 2, we can write the column means in the form M, and M, referable respectively to 


r, and 7, items, so that n = (fa + r»). By definition therefore 


2 2 
.. (M, — My = (Mo — M) and (M,—M)?= AM, — M, 


2 2 
. %=7,(M, — MY? + r(M, — MJ = a — Mey 


ce Valo 
Ta “te Fp 
a (M, SA M,)? 


Pa 
— + — 
Ya Ty 


(M, — Mo), 


(1x) 
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When c = 2, we may also write (11) in the form 


1 =r; 1 j= 
LES R Shere ae soe 2 
ee et A 


We may abbreviate (x) by the substitutions 


1 Ita J = Tp 
g= ul == -E > (%—-M)?  . - 
T Si Ll 


ae ie == e — s. 

We shall later see (16.06) that tests of significance w.r.t. homogeneity rely on the ratios of 
consistent and independent estimates of the true variance (0?) of the putative common universe. 
If, as we shall then see, sẹ and sj are indeed independent, a ratio appropriate to the test of signifi- 

cant departure from homogeneity with respect to one criterion of classification is s? — så. When 
there are only two classes 
a (M ¿E M »)? oe 
—— . (x) 
— + — Js 
r, Tr a b 

The relation of (xii) to the square standard score of the approximate c-test of the significance 
of a group mean difference (Vol. I, Chapter 7) will suggest itself at once. Our unbiased esti- 
mate of the unit sample variance based on the pooled data is 


j=r 


2 
> (x; — MA.) . 
7 A j =| 
Whence we have for the group means 
2 2 eee 
Sí ==—Sap and sj = —Sa 
a To 


For the variance of the mean difference we have therefore 


E ar 
meee aoe! e 
E 


Whence the square of the appropriate c-ratio is given by 
- (M, =: M ta 


x N 
E E Sab 
Ta Vo 


When we have before us more than 2 groups distinguished in virtue of one criterion of 
classification (e.g. breed), our concern will commonly be to ascertain which mean values signifi- 
cantly differ. We must then base our estimate of the sampling error of M, on our estimate 
(s¿) of the residual variance 07. The true variance of M, in the absence of systematic sources 
of variation will be (of ~ 7,) and our estimate of it will be 


. (x111) 


See i i ; . (xiv) 


We may regard our sample score values as divisible into two independent components, an error 
component and a column factor in accordance with the equation 


Reg GG ee 
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By hypothesis, the systematic component is constant within the column, and we may therefore 
write 


_— l 
rl eer ss Aoz, 


EMV). A ao Do; E 
Ki 


When the number of items in the columns is variable, we have seen above that 


Ea 
Ed. . a = ag rá. . y 
E. ME =o (r xe pee Casas 


We thus define our estimete of o? by the relations 


Bisa and <= 


Ls toh 


j="% i=e ME T 
fe > pigs : 2 - ; SeN) 


y j=l i=l (n= e) 


For purposes of computation it is convenient to express this in terms of sums of squares in accord- 
ance with the notation of (xii) and (xiii) in 11.05, wz. : 


1 
M(V.) = V — V(M.) = “(Sa — Se), 
as pe Bs : ; ; ; . (xvi) 
n— c 
Whence in (xiv) 
Sm. i = STan . (xvii) 
ee 0) 


13.08 DEGREES OF FREEDOM 


It often happens that mathematical techniques with application in one domain of practice later 
become useful in a different setting. It may then be that verbal descriptions appropriate to an 
earlier usage invade a field in which their meaning has metaphorical significance for the mathe- 
matician only. We have already seen how the word moment has come to describe statistical 
operations whose connexion with statical mechanics is purely formal. ‘The current expression 
degrees of freedom is another example of such transference of associations. ‘To the mathema- 
tician who is au fait with the dynamical theory of the gyrostat or spinning top it is rich with 
suggestive meaning ; but it cannot be equally evocative to others; and one may well doubt 
the usefulness of verbal formulations for the student of statistics not already familiar with its 
use in other branches of applied mathematics. Accordingly, we shall first define its meaning 
as a numerical concept appropriate to the specification of an unbiased estimate. 
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In preceding sections of this chapter we have repeatedly met expressions exhibiting a para- 
meter (p,) of a universe as the expected value of an n-fold sample parameter (p,) in a form which 
involves an integer f less than n itself, i.e. 


E(Ps) diag f 


> . 


Pu: 


We then say that the sample statistic (p,) which is an unbiased estimate of p, itself is 


n 


Pir A 


For the number f in such expressions, 1t is customary to apply the term degrees of freedom 
(d.f.). This definition does not cover all uses ; and it does not obviously tie up with the literal 
meaning of the expression. It is therefore necessary to explain that statisticians speak of degrees 
of freedom to convey a seemingly quite different intention in the taxonomic domain. 

The alternative usage arises from the need to distinguish between the number (f) of score 
classes sufficient to define a sample and the total recorded number (n) of such classes in con- 
tradistinction to items. In a classification involving one criterion (e.g. swt of cards) the rule is 
that f = (n — 1). For example, it suffices to specify the number of black cards alone in a sample, 
if our record refers only to the binary (black-red) system ; and we then say that the system itself 
has 1 d.f. If we classify our sample by suit, it is unnecessary to specify the heart score, if we 
have also specified the number of spades, clubs and diamonds it contains. In such a set-up 
n= '4 and f ee 

The student need have no misgiving if the connexion between the two uses of the ex- 
pression is at this stage obscure, because each usage has a clear-cut domain. ‘The one last 
mentioned may seem to be trivial at first sight. It calls for special comment only when the sample 
record invokes more than one criterion of classification. As a first example, we shall consider 
the construction of a 2 X 2 table which separately assigns the numbers of red and the numbers 
of black cards distinguishable as picture and other. If the sample consists of s cards taken from 
a full pack, it suffices to know the numbers in any 3 cells of the table, since the number in the 
fourth cell is obtainable by subtracting the 3-cell total from s. Hence f = 3. If the sample 
consists of s, cards taken from a half-pack of red only and s, from a half-pack of black only, it 
suffices to know how many picture cards each contains, and f = 2. 

An interesting situation arises when we classify w.r.t. one criterion of classification a sample 
of known size (s) and the residual pack of (52 — s) cards, as in the schema below. If we are 
entitled to assume that there are 12 picture cards in the 52-card pack, our knowledge that the 
s-fold sample contains x, picture cards suffices to define how many cards each cell contains and 


Pot 
Sample Residual pack Total 

Picture x x, = (12 — x,) 12 

Other S— X; 40 — s + x; 40 | 


Total Ss 52 — $ 52 


On the other hand, the mere fact that the pack contains 52 cards does not necessarily mean that 
it is a full pack in the ordinary sense. If we cannot assume that it does contain 12 picture cards, 
we cannot derive the value of x, from that of x, and we must assign 2 d.f. to the system. 
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Let us now suppose that we take two samples of known size sı and s, from a full pack and 
classify the results as in the 3 X 3 table below. | 


First sample Second sample Residue Total known 
Aces ay Da 4 — a, — a, 4 
Picture by by 12 — b, — b, 12 
Other s, — a, — b, S2 — A, — by ee 36 


a, + 4, +h + bd, 


eer ll | 


Total $1 Sa 52 — $, — Sa 52 


In this set-up we can fill 9 cells, if we have the information to fill any 4 of them. In each 
of 2 = (3 — 1) rows, we have to fill 2 = (3 — 1) cells or (3 — 11(3 — 1) in all. More generally, 
it suffices to fill (r — 1)(c — 1) cells of a grid of r rows and c columns, if we know all the marginal 
totals, but if we merely know the grand total we then need to fill (rc — 1) cells. If we do know 
all the marginal totals, we therefore assign (r — 1)(c — 1) d.f. to the system. If we only know 
the grand total f == (rc — 1). The expression degrees of freedom is meaningful in this context 
in as far as it specifies how many cells of a grid we are free to fill in any way consistent with the 
prescribed conditions without forfeiting the power to fill the remaining ones. 

We have now a clue to what lies bebind the use of the term d.f. for the denominator in our 
unbiased sample statistics for foregoing sections of this chapter. In the two-way set-up, the total 
sample variance makes use of only one item of information about the rc cells in the grid, viz. 
the grand mean (M). If this is the only fixed parameter of the grid, we are free to assign scores 
of any value not exceeding an aggregate of rcM to (rc — 1) cells. On the other hand, our ex- 
pression for s? involves also the mean value of each row and each column. In the same sense, 
therefore, we are free to fill only (c — 1) cells in each of (r — 1) rows. The total then is the 
product (c — 1)(r — 1) which replaces rc in the denominator of V,. , and is what we have other- 
wise defined as the d.f. of the statistic sí. 

The foregoing remarks throw no light on the use of the term, when we later speak of a Chi- 
Square variate for f degrees of freedom for reasons explained in 16.04. There we shall also see 
why degrees of freedom are additive in the sense that the total of the divisors of a complete balance 
sheet is the divisor (rc — 1) or (mcr — 1), etc. of the unbiased sample statistic of which the 
numerator sre. Us, OEM. Via 0 


AN ARITHMETICAL EXAMPLE 


The following illustrates the procedure for estimating residual variance of data involving : 
(a) one criterion alone ; (b) two criteria of classification. The figures are from a paper by Rogers 
and Johnstone* who used aerial slit sampling to compare the effect on the number of bacterial 
colonies obtained from air of a hospital ward after sweeping an oiled floor with a broom and after 
use of a vacuum cleaner. The same ward of a premature baby unit was swept by a broom on three 
successive days following the day the floor was oiled, and by a Hoover on the same three days 
of the next week. Counts were made on culture media exposed at intervals of one minute during 
the three minutes before sweeping began, at one minute intervals for four minutes while sweeping 


* Rogers and Johnstone, 1951. $. Hyg. 49, 497. 


—— 
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went on and at one minute intervals for three minutes after it ended. Counts made on the days 
when the ward was swept by a broom were as follows : 


Successive Observations Wednesday Thursday Friday 


Before sweeping 1 59 40 40 
2 50 37 35 

3 61 44 32 

During sweeping 4 56 22 40 
5 50 21 20 

6 30 30 24 

7 62 66 23 

After sweeping 8 63 40 22 
9 38 30 23 

10 32 30. 23 


If we first assume that the only systematic source of variation is referable to the effect of 
sweeping, we set out our data as below. We have then 3 columns and 30 observations in all, 


so that (n — c) = (30 — 3) = 27. 


Before | During After - 
Day and Minute Sweeping Sweeping Sweeping 
(Xa) (xp) (æ) 
Wednesday 1 | 59 56 63 
2 50 50 38 
3 61 30 32 
4 y 62 a 
Thursday 1 | 40 22 40 
Z 37 21 30 
3 44 30 30 
4 = 66 Ri 
Friday 1 40 40 22 
2 35 20 23 
3 32 24 23 
4 23 Eo 
| 
No. of observations 9 12 | 9 
a= 898 ; >, x, = 444; 2 201: 
M, = 44-22; = 370: M, = 33-44; 
>. (#2 — Ma)? = 855-56; È (x, — My)? = 3238-0; © (x, — M)? = 1312-22; 
1 
= E a dE des My F > (2, cee M)? EA 2 Ta m} 


= = {855-56 + 3238-0 + 1312-22} = 200-21. 


Thus our residual variation (s,) is V200-21 = 14:15 and the standard errors of the means are 
M, M, M, 


Pa PP. Pon 
9 12 9 
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Thus we have for the standard errors of the differences between the means 
(M, — M,) and (M, — M,) M, — M. 
WV (472)? + (409)? = +624 V472) = + 6-67. 
We may thus summarise the outcome of our calculations as 
(M, — M,) = 7-22 + 6-24 ; 
(Ma — Ma = 10-78 + 667 ; 
(M, — M,) = 3:56 + 6-24. 


None of these differences is significant at the 2c level ; but we may be misled by the circum- 
stance that our estimate of residual variance is excessive because of failure to eliminate a second 
source of systematic variation, viz. the number of days which have elapsed since oiling the floors. 
Accordingly, we set out our data thus for two criteria of classification : 


10 10 2| 

Before Sweeping During Sweeping After Sweeping | 
Vir Xir | M; 

Minute : 1 2 3 4 55 6 7 8 9 10 i= i=l | 
Wednesday 59 50 61 56 50 30 62 63 38 32 501 251001 50-1 
Thursday 40 37 44 22 21 30 66 40 30 30 360 129600 36-0 
Friday 40 35 32 40 20 24 23 22 23 23 282 79524 28-2 

3 
> Xoj 139 122 137 118 91 84 151 125 91 85 
j=1 


2 
> ‘a 19321 14884 18769 | 13924 8281 7056 22801 | 15625 8281 7225 
j 


M, 46:33 40-67 45:67 | 39-33 30-33 28-00 50°33 | 41-67 30:33 28:33 

3 

> x 6681 5094 6681 5220 3341 2376 8729 6053 2873 2453 
Bee 


We now calculate the residual variance by the formula 


et RN 


Ps a 


In this expression 


S = > $ su | = 435483; 


j=1 i=l 


1 on 
1 C r 2 
Yy me > PA = 45389-0 ; 
f ¡=1 Lj=1 
Se E > te > zs | = 460125 ; 
= ¿=1 
. $ = z4 (1647-8) = 91-54. 
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In every column of the last table the number of observations is 3. Whence the variance of each 
column mean is the same, viz. 91:54 — 3 = 30-51. The variance of the difference between each 
pair of column means is also the same, viz. : 30-51 + 30-51 = 61-02 and its s.e. is 1/61-02 = 7-81 
The column means are as follows : | 


Before Sweeping During Sweeping After Sweeping 
p 


46°33, 40°67, 45°67 39:33, 30:33, 280, 50°33 41:67, 30°33, 28-33 


Thus the greatest and least difference between the before and after column means are now 
(46:33 — 28:33) = 18 and — 1 each with a standard error of 7-81; and this method of analysis 
discloses nothing to discredit the conclusion suggested by the foregoing treatment. 

We may now look at the same data set out in each way from a different viewpoint. If we 
adopt the one-way classification we may ask whether the grid is homogeneous w.r.t. the column 
criterion. If we adopt the 2-way procedure, we may ask whether our data justify the assumption 
that there is a second source of systematic variation implicit in the row criterion. To test homo- 
geneity w.r.t. the column criterion in the one-way grid, we require the statistic of (viii) in 13-07, 
Or: 


c= 2 r(M; — My ; i i ; 0) 


To test homogeneity of the row criterion in the second lay-out, we require the statistic s2 defined 


by (v) of 13.03, viz. : 


Tae i ee ae 


In the first lay-out, (c — 1) = 2 and 
M = (398 + 444 + 301) = 38-1. 
Whence we have 
só = 3(44-22 — 38-1)? + 42(37-00 — 38-1)? + 2(33-44 — 38-1)? = 273-525. 

Thus the variance ratio for the test of homogeneity w.r.t. the column criterion in the one-way 
lay-out is 
2 
3=—~ 214 . i ; (4D) 


In (ii) since (r — 1) = 2, 
TE 45389-0 — 43548-3 


5 9 == 92035, 
Thus the required variance ratio is 
s 920-35 
s = 91:54 ae 10:1 , ‘ è 5 . . (1v) 


At this stage, we have not established the appropriate test of significance (Chapter 16) for 
(iii) and (iv); but here note that (iii) is a ratio of variances with divisors (c — 1) = 2 and 
(n — c) = 27, while (iv) is a ratio of variances with divisors (r — 1) = 2 and (r — 1)(c — 1) = 18. 
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As we shall later see, the expected values of (iii) and (iv) are respectively 27 -+ 25 = 1:08 
and 18 — 16 = 1-125. At the 5 per cent. significance level F for the divisofs (so-called degrees 
of freedom) is about 4-2 and about 4-4 for 2 and 18. At the 1 per cent. level F for 2 and 18 d.f. 
is about 8-3. Thus there are high odds for a systematic source of variation associated with how 
recently the floors were oiled, and our use of the lower estimate for the residual variance is ac- 
cordingly justifiable. We may accordingly recalculate the standard error of the mean for our 
initial lay-out of 3 columns as below : 


M, or M, M, 
91-54 91-54 
a +. 


Whence our corrected differences are 
M, — M, = 7:22 +42; 
M, — M, = 1079 + 45; 
M, — M, = 3:56 + 42. 


Addendum. This chapter embodies the writer’s method of presenting to students of biology 
assumptions which underlie the Analysis of Variance at a time when the concept of significance 
had not yet become the target of a formidable body of criticism, as explained in a final 
chapter written after the rest of the book had gone to press. It may be too early to surmise 
how far the technique of estimation embodied in Churchill Eisenhart's exposition of his Model I 
situation will stand the test of time ; but we may be confident that the battery of test procedures 
based on the F-ratio, as expounded in Chapter 16 below and illustrated by the foregoing numerical 
example, will retain no place in a future curriculum of statistics, if the views of Wald and Neyman 
gain ascendancy. Meanwhile, some of the foregoing exposition may not be valueless, if it 
focuses attention on neglected factual assumptions implicitly invoked by those who continue 
to use the method. 


CHAPTER 14 
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14.01 REALITY AND RIGOUR 


STATISTICS is a branch of applied mathematics. As such it involves issues of two kinds, correct 
mathematics and correct application. The theme of our last two chapters has been the latter, 
in so far as our concern has been to explore in what circumstances assumptions implicit in the 
statistical techniques known as correlation and analysis of variance are more or less relevant to 
the real world. Having decided, with appropriate reservations, to adopt a procedure of one or 
other sort, we come face to face with the problem of significance. This is partly a mathematical 
issue and partly a matter of common sense, i.e. awareness of reality. In Chapter 16 we shall 
deal with significance tests appropriate to the issues raised in Chapters 12-13. In this Chapter 
and the one which follows our aim will be to lay the foundations. First, we may pause with 
profit to get clear about what we are discussing when we invoke a significance test. 

A significance test assigns the odds for or against an occurrence prescribed by a particular 
hypothesis. As such it tells us the ratio of all frequencies of a class of events which exclude its 
specification to all frequencies of a class of events which include it. From a mathematical 
viewpoint this is an exercise involving the summation of two sets of frequencies. From a 
practical viewpoint, the outcome is not unique unless we have some additional information to 
guide us in assessing what odds justify the rejection of the null hypothesis. Thomas Bayes 
pointed this out two hundred years ago, though it is still a common delusion that odds of 20: 1 
(or 370 : 1, according to taste) against the occurrence on the assumption that the null hypothesis 
is correct suffice to justify us in rejecting the latter. 

In this context, there is no need to elaborate previous remarks (Chapter 5, Vol. I) on so 
prevalent a misconception. Our task is here to clarify what we do when we assess the odds ; 
and this involves prior knowledge of the parent universe. When we can exactly specify the 
universe (e.g. cubical die or card pack) from which we sample the procedure is easy to formulate 
and to visualise in terms of the areas of columns (Chapter 3, Vol. I) of a histogram of unit area. 
The nature of the sampling process and the distribution of corresponding unit scores in the 
parent universe supply all the information relevant to the specification of the r-fold sample by 
methods now familiar to the reader; but the exact distribution of unit scores in the universe 
of scientific enquiry is something we rarely know. Indeed, a 2-class universe (e.g. hearts and 
other cards) is the only one of which it is true to say that the algebraic form, i.e. (q + p}, of 
the unit sample distribution (u.s.d.) is implicit in the definition of it. In the domain of 
representative scoring, we can so specify the algebraic form of the u.s.d. if we score by rank 
but then only if there are no ties. If so, of course, the distribution is rectangular. Otherwise, 
experience alone can justify whether a particular mathematical expression is or is not a reliable 
description of the universe of which our observations constitute a sample. In seeking such a 
description as a basis for a prescription of the consequences of sampling, we are not entitled to 
expect that we shall be able to find an exact one; and if we have to choose between several 
seemingly good enough expressions, mathematical convenience will necessarily influence our 
choice. That is to say, we shall prefer an expression which is most amenable to the algebraic 
manipulations invoked by sampling theory. 

One relevant consideration in this connexion is that mathematicians are much more skilful 
in assessing the area of segments of a continuous curve (e.g. the normal) than in summing exactly 


6 
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a series of terms increasing by finite steps. We can, however, accomplish the latter approxi- 
mately by visualising the contour of a histogram as the jagged outline of a continuous curve ; 
and the outcome is often as precise as need be. Curves which are monotonic (p. 328, Vol. I) 
or unimodal have special descriptive advantages, because we can specify the properties of a wide 
range of types by the method of moments (Chapter 6, Vol. I). From a mathematical viewpoint, this 
is a great convenience ; but it is important to remember that we rarely, if ever, meet a statistical 
universe to which we can be confident in assigning a continuous unimodal curve (e.g. the normal) 
as an exact description of the score distribution. It is conceivably true that such a curve 
would truly describe the distribution of the weights of beans in a pure line: but it is an act of 
faith to assert that in fact it does so. What we may know is that our observations exhibit no 
discontinuities if we plot them appropriately ; but an inescapable limitation to any legitimate 
inference from this procedure is that our measuring instruments involve a scale of finite 
increments. 

Advances in the theory of statistics leading to the distributions of Chapter 16 have gone hand 
in hand with greater concern for rigour, i.e. exacting criteria of what we can rightly infer from 
our initial assumptions ; but how far this will eventually prove to be a sign of healthy growth 
must depend on how far we fully realise the adequacy of the initial assumptions as a description 
of the real world. It is especially needful to be on our guard against the plausibility of the follow- 
ing syllogism : (a) such and such a distribution closely describes the universe of our observations ; 
(b) sampling from a universe with such a distribution leads to certain consequences ; (¢) sampling 
from the universe of our observations has the same consequences. This is a mon sequitur. 
The first premise is purely empirical, the second purely formal and (c) is valid only if the 
procedure involved in (b) does not unduly magnify any of the errors implicit in the qualification 
closely. 

In conformity with these considerations, we are committed to scrutinise any argument which 
starts with the assumption that a universe is normal or that of an r-fold sample therefrom 
is normal. We shall be in a better position to appreciate how often such an assumption is 
legitimate for practical purposes, if we examine the characteristics of discrete distributions with 
a view to formulating criteria of the adequacy of substituting a normal (or other continuous) 
curve as a descriptive device. In this, and in the next two chapters, we shall assume two pro- 
positions to be acceptable without elaborating earlier remarks (Chapter 6, Vol. I) on the meaning 
of moments as descriptive parameters of a distribution : 


(a) if all the moments of two distributions are identical, we are entitled to regard the dis- 
tributions as identical ; 


(b) if all the moments of one distribution (B) lie between those of two others (4 and C) 
which tally sufficiently for practical purposes, we may use A or C (as most convenient) 
to describe B. 


These assumptions sufficiently explain the need for a closer study of procedures for evaluating 
moments (14.02 and 14.03) and for examining (as in other sections of this chapter) the circum- 
stances in which the moments of a prescribed sampling distribution approach those of the normal 
or other continuous function which is easy to tabulate for reference. Not all the material in 
what follows is essential to an understanding of the rest of the book ; and the reader may well 
prefer to scan this chapter quickly, returning to it at a later stage if necessary. 

When we say that a continuous variate (y) is satisfactory as a function descriptive of a dis- 
crete sampling distribution, we may adopt either or both of two criteria, having in mind the 
relation of the approximate fitting curve y = f(x) to the contour of the exact histogram. We 
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construct the latter * on the assumption that the area of each column of height y, on a base 
extending from (x — 4Ax) to (x + 3Ax) is numerically equivalent to the frequency (fs) of the 
score x. Since the sum of all frequencies is unity by definition, for a score with positive values 


only 
r= x = 0 
AA o ee 
=0 t= 


The reader will note that the exact limits are irrelevant, since f, = 0 if x lies outside the range 
of permissible score values. If x may be negative or positive, we must write the above as 


AA ee A ee 


T=— 00 


The sum of the frequencies in the range x = a to x = b is 


z=b 
È f= Sam Y yo. de ee ae . (iii) 


The actual boundaries of the corresponding columns of the histogram of unit area are not the 
mid-points a and b but (a —- $Ax) and (b + ¿Ax). 
With due regard to these considerations, we may ask for a fitting curve such that 
(a) if the ordinate y at x goes nearly through the mid-point of the upper extremity of the 
corresponding column of the histogram, we can approximately specify the frequency 
of the score x if we know the scale Ax, i.e. 


py, and pubes sy: i š > . (tv) 


(b) if the area bounded by the ordinates of the curve at x = aand x = b corresponds closely 
with that of the segment of the histogram with the corresponding boundaries at 
(a — 4Ax) and (b + 4Ax), we can approximately specify the net frequencies of score 
values within the range, 1.e. 


t=b b+tAg | 
S year] : vida La i ; : = (v) 
Zz=a a—tAe 


In accordance with our definition of frequency, any such fitting curve must also fulfil the 
criterion of unit area over the whole range of permissible values of x itself: (1) positive only, 
(ii) negative and positive : 


a) [oy de = Gi) |" yde =1 oe 


To make the foregoing criteria explicit we have to define a criterion of error, as follows : 


(a) if e, is the error consistent with a satisfactory ordinate fit in a specified range, we may 
express (iv) more explicitly as 


(y. —y).Ax < le, | pL OR) PAT Bho (ay 
(b) if ea» is the error consistent with a satisfactory expectation fit 
Sar =" Los leas . . . . . . (viii) 


* "The reader may find it useful in this context to read more detailed remarks on the build-up of the frequency histo- 
gram in 15.01 below. 
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In discussions on curve fitting, there is not always an explicit recognition that the two criteria 
defined by (vii) and (viii) have different domains of relevance. If we are seeking a criterion of 
goodness of fit for an empirical distribution (vii) may have more to commend it, inasmuch as our 
main object in so graduating our data may be to assign a numerical value to the frequency of the 
occurrence of a particular score. This is not the end in view when we perform a significance 
test. Our concern is then to sum frequencies of all score values which do not exceed a specified 
limit or limits; and (viii) is the appropriate criterion. The distinction is important in so far 
as no unimodal descriptive function could be satisfactory in the sense defined by (vii), if the 
exact distribution has gaps such as we have seen (Chapter 4, Vol. I) to be characteristic of the 
distribution of the proportionate score difference distribution of co-prime samples from an 
infinite 2-class universe. On the other hand, the fact that we can eliminate such gaps by group- 
ing disposes of any objection against seeking a satisfactory fit in accordance with (viii). 


1402 MOMENT GENERATING FUNCTION 


The type of generating function dealt with in 11.08 is one of many ways of summarising 
the operations of the independence grid with a view to specification of a sum or difference dis- 
tribution. The essential desideratum of such a g.f. is that the dummy factor which identifies 
the score value associated with a particular frequency as a co-factor obeys the product rule for 
indices, viz. 1° . È = t°? and 4.1? = t°’. This, of course, is equally true if we express ¢ 
in exponential form as t = e”, so that e*.e" =e". If u, is the frequency of the score 


x we may therefore label our dummy co-factor as e? defining a new class of generating functions, 
exactly as before with the substitution t = e”, so that 


G =u + He + me” + use™ .:. Ue : 2 0 
G =t tue" + ue” +u0 "AIN = fii) 


If we label the cell frequencies of the grid for 2 independent variates u, and v, as Yi; = 4¡0, 
in accordance with the schemata exhibited in 11.08, the rules for the summation of frequencies 
of the sum (s) and difference (d) are as there given, vz. : 


LES 


z=d 
f= > Visa a and Pa 2 Vidrio» 


g=0 


If G, and G_,„ are each referable to score frequencies of the unit sample distribution, x, and 
x, are respectively score-sums of independent a-fold and b-fold samples from the same universe, 


G(xq + x) =G*+ and G(x, — 2%) = G3. G>, E . (iii) 
For the proportionate (or mean) score difference distribution : 


$ Xy h 2h 8h „a _2h _3h 
az > za) = (Uy + Mé? + uge’ + uget . . Ju) + ine e+ ue > +upe >...) 
a 
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TABLE 1 
THe M.G.F. as A GRID OPERATION 


Second Zero Moment of Heart-Score sums w.r.t. 3-fold and 2-fold samples taken with replacement from 


two full packs 
3-fold Sample 
Raw-Score 0 1 2 3 
Exponential Score e? er e2h eh 
27 27 2 1 
Frequency 64 64 64 64 
2-fold Sample 
s=0; $ =0 s=1;s?=1]1 tas oe s= 33 8 = 9 
eh — e? e — eh e — erh e — eP 
0e 2 
16 
ga _ 243. HE ja 
00 = 1024 Yio = 1024 Yao = 1024 Yso = 1094 
s=1;s? =1 ten 7 = 4 s=3;s?=9 s=4;s?=16 
eh — er esh == eh esh e e3h esh == eth 
le É 
16 
_ 162 — 162 — 4 ee E 
Yor = 1024 Vir = 1024 Y21 = 1024 Ya1 = 1024 
su a ge = 3; 52? =9 gd; e = 16 ge 5s go? =25 
esh = e2h esh — exh esh — eth esh — e5h 
ad 
16 
A E ZA — = ee ER 
Yoz = 1024 Vie = 1024 Yaz = 1024 Ys2 = 1024 
Diagonal Frequencies (X 1024) 
Yo Y, Y, Ya Yi Y 
243 405 270 90 15 1 


E(e™) = 2-10(243 + 405e” + 2708h + 90e% + 15e1h + er); 
D? . E(e) = 2-19(405e» + 1080e2h + 810e» + 240e4h + 2505») ; 
D?_.,E(e) = 2-19(405 + 1080 + 810 + 240 + 25) = $; 
E(s2) =u a(s) = = = 5pg + Sp”. 


In doing this we have done nothing new, since t = e” is merely a dummy for purposes of identifica- 
tion; but the substitution has the merit that the generating function so expressed has a dual 
purpose. We can use it as before to write down the terms of a sampling distribution ; but we 
can also use it to evaluate its moments in a new way by taking advantage of the differential pro- 
perty of e”, viz.: 

xh? x3p3 xp! xf Bad hs 


oe a a et 


xh? xh  xht x®hd 


a a aTa 


th? xh? xh 


Oe 00+ e+ ah E 31 > 


e a 


E Tre t r — etc. 
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TABLE 2 


THe M.G.F. as a GRID OPERATION 


Second Zero Moment of Heart-Score difference (d) w.r.t. 3-fold and 2-fold samples taken with replacement 
from two full packs 


3-fold Sample 
Raw-Score 0 1 2 3 
Exponential Score e? er on aa 
F 27 27 2 1 
e cca 64 64 64 64 
2-fold Sample 
d=0;d=0 =l; di = d=2; dt =4 d=3; d =9 
gah — g? gah cx oh gil os pth eth — oh 
ca 2 | 
e 16 
— 28 _ 28 __ 81 Mi 2A 
Yoo = 1024 Yio = 1024 Yeo = 1094 Yso = 1024 
dæk dall dek SO dsi dad d=2-@=4 
eth — eh Ab gah == gh gah — gh 
1 enh £ 
16 
_ 162 caer $A a aoe M 
Yo. = 1024 ¥11 = 1094 Y21 = 10924 Y31 = 1094 
d=->-2:4=4|d=-1;4d= d=0¡ 40 d=1;d?=1 
eth — g-2h glh — gh E PF, 
2 en 4 
16 
BEA 3 AA ae Le Y 
Yoz = 1024 Yı2 = 1024 Y22 = 1024 Y32 = T024 


Diagonal Frequencies (x 1024) 


rus a Yo Y, Y, Y, 
27 189 414 298 87 9 


E(e%) = 2-10(27e—24 + 189e-h + 414 + 298e» + 87e? + Qe3h) ; 
D? . E(e#) = 2-10(108e—2 + 189e-* + 298e” + 348e%4 + 81e%) ; 
D? ,E(e%) = 2-10(108 + 189 + 298 + 348 + 81) = 1 = (d); 


E(d?) = 2-(0 . 414 + 1.189 + 298 + 4,27 +87 + 9.9) = 1 = pa(d). 
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In general therefore 
ky wh k k rrt weal 
wh) — +1 +27 +3__ 
DN) == a OX a gy ete. 
When h = 0, this reduces to xë. Hence we may write 


e MP E « XS 


00 


== DG Dro - > ans ae, 


0 


We make the upper limit of summation infinite in the above, since y, vanishes for all values of x 
greater than its highest value. We now recall the definition of the kth zero moment of the 
aa. > 


k 
Mk ~ >. Uys X 3 
z=0 


Epa DAG.) ano . : : : > A) 
The operation holds good, of course, for generating functions of the a-fold distribution, the score ' 


difference distribution, etc. 


Example 1.—Find the moments of the distribution of the mean score for the 3-fold toss of a 
tetrahedral die with face scores 1, 3, 5, 7. The g.f. of the u.s.d. is | 


h 
Heh + eh 4 e5h 4. eh) = A + eth 1 eth y 66h), 
That of the 3-fold sample score-sum is 


3h 3h 
all + g2h 1 eth | esh) — al + Be2h y Geth 4 100% 4 1208» + 12610 + 10g12h 1 Geldh+ Zelsh + el8h), 


For the mean score we may write this as 


5h 7h 9h 11h 21h 


— 


aale + Be F 4. 6e? + 10e3 + ee JG. 
We now apply the rule : 
(Di. eya = 2%, 
pe = Bill OS 


By recourse to the chessboard procedure, the reader will easily see that this is the weighted mean value 
of the kth power of the 3-fold sample score mean. l | 


* * * * * * * 


The foregoing example merely illustrates how the operation of extracting the moments 
works in so far as it dispenses with the need to visualise each step by recourse to a grid lay-out. 


* We here assume that the range of scores is positive only. More generally, p, = E(x") for a score x with weighted 
summation over the whole range from — œ to + œ as for the burette universe of 14.05 and exercise below. If 
M, = 0 we then have up = Mp. 


586 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


What is more important is that we can use it to derive results of general interest. In this con- 
nexion we may note that the operation of determining the Ath mean moment (my) is precisely 
analogous to the foregoing procedure. If yy is the frequency of the score deviation X our 
m.g.f. is 


GAX = > 956.07 


The Ath differential w.r.t. h is 
00 72 h8 
È yx(X* +h. A p A H XMS |. ete,), 


When A = Q this reduces to 
00 
> Y = ARO = m. 


Expressions of this type simplify in virtue of the identity m, = 0 = E(X). En passant we may 
also note that po = E(x°) = 1 = E(X°) = mp, i.e. G(x) = 1 = G,(X) whenh=0. Analterna- 
tive way of labelling our frequencies leads to a familiar series of formulae. If we write u = M, 
so that X =x— M: 


> yy. eth = 5 Ya. ¿oh — ¿Uh 5 Vn Gs 
0 


0 

+ G(X) = G(x). 
If we write for brevity w = G,(X) and z = G,(x), 

Dase, Dy- Mes: 

Diw = eM? , Diz — Me™" , D,z + Me™ — Me™. Da; 

Djw = eM" , Diz — 3Me-™* , Diz + 83M%e-™ | Dz — M3 Uhz, 
When h = 0 so that z = 1 = e"™ and Diz = py, we have 

m = 0; Mm, = pa — M?; mz = uz — 3Mp, + 2M. 


Similarly, we may obtain higher mean moments in terms of zero moments. Likewise, we obtain 
the familiar expressions for zero moments in terms of mean moments, if we write 


G(x) = MG, (X). 


In what follows we assume that our scores increase by unit steps (Ax = 1), since the 
appropriate scalar transformation is straightforward if this is not so. If g = ax, so that Az = a 
we may write 


HR = EAA; 
+. pal2) = (A2) . pala). 


In particular, if x, is the score-sum of the p-fold sample, that of the mean score is (x, — p) = Xm 
and x, = P . Xm, whence pz(x,) = p" . ur(®m) and 


pil Am) = e r 
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By recourse to the properties of the m.g.f., we can establish general rules for the derivation 
of the moments of the r-fold sample score-sum and mean score in terms of the moments of the 
u.s.d. Later we shall see that the same rules are deducible by iteration (14.04 below) and by 
recourse to the multinomial theorem (17.03). For brevity, we shall write as the m.g.f. of the 
u.s.d. of scores which increase by unit steps, so that 2” is the m.g.f. of the r-fold score-sum 
distribution, and we may therefore write 


zt = (uy + ue” + use? + uge” Y. 
We now recall Leibnitz’ rule for deriving D(z") when z = f(/), so that 
D,(D,2)? =2D,2 . Diz and D,(D,2) = 3(D,2)? . D;z. 

We may then write 
Diz =rx2"1D,2; 
ES 1" a ADT > 
Dig = r92- UDr + Sr 2°-2D,z . Diz + ré Dz ; 
Dig" =y We -(D, 21 + 6r)2"-8(Dyz)2Diz + 3r'2"-2( Diz)? + 4r' 2" 2D,z . Diz + ra" Dye. 


If we set h = 0 in these expressions, denoting by uz and u(r) the kth zero moments of the 
u.s.d. and r-fold score-sum distribution respectively, we then have 


pty : e 7 i l i ; (v) 
nde pit re. ; ; ; i ; ; Am) 
oe, a WD 
palr) = rut + 6r u? . ua + 3183 + dr Yu, . ps + Tua NUM) 


Mean moments can be derived from the above in the usual way ; but we can do so directly 
by the preceding method. If we define z appropriately as the g.f. of the mean moments of the 
u.s.d., the result is formally identical, but the expressions simplify in virtue of the identity 
m, = 0, so that 


ma =E= mm, ; i ; i i A) 
NAT) = TM = ; í e 
MAT) = : , à ; : e 
m,(r) = 3rm3 + rm, an 


One use we can make of the above is the definition of the moments of the r-fold sample 
score distribution defined by successive terms of (q + pY. The u.s.d. is q; p for scores O, 1 
or score deviations — p and q, whence 


He =P; Mm =g(— pF + par. 
We therefore obtain 
palr) = 7p ; | 
par) == r(r — 1)p? + rp = rp + rp(1 — p); 
par) = 1 9p3 + Srp? + 1p; 


pa(r) a rpt + Gr(3)p3 zh Trp? de rp. 
6* 
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Similarly, we derive 
m(r) = 7pg ; 
mar) = pal? — p°) = rpa(q— D) ; 
m,(r) = 3r(r — 1)p*q? + rpq(1 — 3pq) 
= 3r'pig’ + 1pq(1 — 6pg). 
Another result we may usefully invoke at a later stage is the distribution of the difference 


between independent unit samples from the same universe. We may write the g.f. of the zero 


moment of the u.s.d. as 
g = 0O 
a k k en 
yas Ue 20 Di ET 
a=0 


Since we may regard the paired score difference as the sum of positive and negative scores 
distributed in the same way, we may write for the corresponding g.f. of the negative score 
distribution 


a = 00 
v= > Ds Di 
z=0 


The m.g.f. of the difference distribution in accordance with the product rule is then uv, and 
pld) = (DE. we) 
By successive differentiation we have 
D,uv = uD,v + vD,u ; 
Diuv = uDiv + 2D,u . Dav + vDia ; 
Duv = uDño + 3D,u . Dio + 3Dju. D,v + vD3u ; 
Duv = uDiv + 4D,u . Div + 6Dju . Div + 4Dju . Dyv + vDju. 


In these expressions when h = 0 
u=l=. Dae pp, = — Dw: Dia = ue = 005 
Dig = p = — los Din = yu = Dio. 
We therefore obtain 
pald) = 0 = usd); mold) = 2m; puld) = 2u4 + 6pz. 


Since the mean of the distribution is zero, the mean moments and zero moments are identical. 
So we have 


mad) = 2m,; mid) =0; m,(d) = 2m, + 6mí. 


More generally, we see by the same procedure that the odd moments of the paired difference 
distribution from the same universe are all zero and the even moments are the same as those of 
the 2-fold sample. Ifthe u.s.d. is symmetrical, the odd moments of the 2-fold sample score-sum 
distribution are also zero, as is true of a universe if successive terms of the expansion (4 + 4)? 
define the score frequencies. As we have then seen, the frequencies of the 2-fold sample score- 
sum in the range 0, 1, 2 . . . 2a then tally with successive terms of (4 + $)%, We thus arrive 
at the following conclusion : if the binomial (4 + 4)? defines the u.s.d. in the range 0 to a, the 
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binomial (4 + 4)2* defines the distribution of both the pair-score difference with range — a 
to + a and of the pair-score sum with range 0 to 2a. If the scores of the u.s.d. increase by 
Ax from m to m + aAx, those of the pair-difference increase by Ax from — aAx to + aAx and 
those of the pair-score sum from 2m to 2m + 2a4x with the same increment. 

Throughout this section our concern has been with discrete distributions. We may extend 
our scope, if we recall remarks at the end of 14.01. The frequency (fẹ) of a score x iS 
numerically equivalent to the corresponding area of the histogram of unit area, being Y ¿Ax if 
y. is the height and Ax the base whose midpoint is x. We may thus speak OL Y > AR as 
the frequency of a score in the range (x + $Ax). When Ax becomes indefinitely small, we 
may write y = y, as the ordinate at x and y.dx as the frequency of the score x in the range 
x +4dx. We then have 


Sl y. dx; 
7 feat = | yw de = pip; 


00 
fac =| pt ade: 
0 


oM8 o 


The last expression is the generating function of the zero moments, when all score values are 
positive. If Y. dX is the frequency of the score deviation X in the interval X +4dX, the g.f. 
of the mean moments is . 


00 
| A A. 
When the function Y is symmetrical we may write this as 


ay. pa 


0 


Such expressions are of no use for determining moments unless they are integrable, as is true 
when the distribution is normal (vide 14.04 below). A class of continuous distributions of special 
importance is that of Gamma variates specified as such in accordance with the definition of the 
Gamma function, viz. : 


00 | T(n) p” 3 
ge tl dy == O Oe ee = 1. 
| R” P(n) 0 


Thus the ordinate equation of a Gamma variate is 
BP eB, x1 
ESA ee ae 
The g.f. of the zero moments is therefore 
r E po A de rl. E Ad 
SR I(n) 


a 
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In this expression we may expand (k — hy” by the binomial theorem, viz. : 


(k — hy = k” key + Ms eT mage ee 


RESP os ot ete 


E Gy Ee 


7 (n ye q A l 


r=0 7! 


(n + JORA (n + 3) R-4n8 


D,G,, = 1k71 + (n + 1Yk 2h + 3] e oo ete; 
(4) -4],2 

DIG, = (n + DPK + (n + 28 RSA + sain dai IA 
(5) L-5),2 

DIG, = (n + 2873 + (n + 34h + oo E 


When A = 0 we thus get 
ty = Oe oe = Ot dl ge a A a 
We shall obtain this result by another method in Chapter 15. 


EXERCISE 14.02 
1. Find the first 4 zero and mean moments of the mean score distribution of the 3-fold toss of a 
tetrahedral die with face scores | 
(1) 1, 2, 3, 4. 
Gi) 3,6, 9, 12, 
(iny 12,203. 
E Me e A 
2. Find the same moments for the distribution of both the raw-score and proportionate-score 
difference w.r.t. 3-fold and 2-fold tosses of each of the above. 


3. Investigate the first 4 mean moments of the distribution of the mean score of samples from a 
3-class universe with the following u.s.d. scores: — 1, 0, + 1, frequencies Pa, Pos Pe- 


4. Find general expressions for the first six Pearson coefficients (f, to B¿) of the u.s.d. of the above 
for the symmetrical case when p, = p, (for definition of fBz to B, , see 14.04 below). 


5. Find expressions for the 6th and 8th mean moments of the 6-fold sample from the symmetrical 
3-class universe and hence for the corresponding values of 8, and fs (see 14.04). 


6. Derive expressions for the first 8 moments of the score difference distribution of 6-fold samples 
from the symmetrical 3-class universe and show that they are equal to the moments of the distribution 
of the sum of the difference between 6 pairs of unit samples. 


Y. Tabulate results of the above for p, = pe = 3, 3, 4, $, +5 and check all results by using the 
probability generating functions of the distribution (see Ex. 6, 11.09). 


8. Show that the m.g.f. of a Poisson system (vide Chapter 10 of Vol. I) for 7 unit samples from 
r 2-class universes is 


Gu) = TT (a+ pe: 


Check this result numerically for samples of 3 by putting p, =4, pa = 4, ps = 2. 
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9. If we have two series of urns with parameters p, , ; and p,, ; definitive of the u.s.d., show that 
the m.g.f. of the mean of r paired differences in sampling with replacement is 


i=r 


Gip; = [T ee + 8, + Ce") 
i=1 
in which expression 


A, =Pa.i Uvas Bi Pace Pict dat: oii Cm Poi Laos: 
10. For the last set-up cite the value of 8, for the 3-fold sample of paired differences when 
Do. =R.Pa.: 


1403 FACTORIAL MOMENTS 


We can sometimes sidestep difficulties w.r.t. derivation or computation of moments, more espe- 
cially moments of a discrete distribution, by recourse to analogous indices in which factorials 
take the place of ordinary powers of the score x or of its deviation X = (x — M) from the mean. 
Thus we define zero and mean factorial moments as follows : 


Uik) = > Y y . ga and Mk) dd >: Y y . = Gas . . e ° (i) 


Here we concern ourselves solely with the zero factorial moments, recalling that 


p< Da — 2) (a — + Y= oar (11) 


For the first 8 powers we have 
eo F 
Y = PF 
y = gt DP ee 
x4) == xt — 6x3 + 11x? — 6x; 
x) = x5 — 10x* + 35x3 — 50x? + 24x; 
x8) — x! — 15x5 + 85x4 — 225x3 + 274x? — 120x ; 
a = x? — 21x8. + 175x5 — 735x* + 1624x8 — 1764x? + 720% ; 
x8) — x8 — 28x7 + 322% — 1960x5 + 6769x1 — 13132x3 + 13068x? — 5040x. 


(k) 


Since we can always express x'®™ in the form 


r=kķ-—1 
2 "k. es 
fp y 
r=k-1 
ee Mx) = Ma == = ARS 
r=0 


r=k-—1 
= by YK Mey 
r=0 
From the relations cited above 


=> 
po = Elx?) — E(x) = po — pa; 
ts = Elx?) — 3E(x?) + 2E(x) = ps — Sue + 2p, etc. 
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We can thus derive any ordinary moment as a series of zero factorial moments, the first eight 
being 

My = Pays 

He = Ma F Pan 3 

Hs = Pus) + So) + Hay 3 

Pa = Kay + Gro) + Thea + Bay 5 

Bs = pus) + lOu + 2543) + Lo + pun 3 

pe = pus) + 15 pis) + 6ta + DO o) + Slua + Pan 3 

py = paz) + Apo) + 14015) + B50 a) + 30lu Só) + Man 3 

ig = pus) + Buen + 2646) + L050 p45) + 170144, + 6u + 1272) + Puno» 

The advantage of this arises from the fact that simple general expressions for factorial 


moments may be obtainable, when it is not possible to derive simple expressions for moments 
of the more familiar sort. The following examples will show that this is so: 


(a) The Poisson Distribution | 
For this distribution with mean M and range x = 0 to œ 


E e MY? 
a e 
00 Mx RE 
a T E (iii) 
In virtue of (ii) we may write | 
Me™ M” ME : Mex 
oe — 
co M*-* 
= =e, M" > ——_ . i i i JA 
Hik) e 2 (x w k)! (iv) 


All terms in the summation vanish if x < k, so that we are concerned only with x = k, k + 1, 


bale Oo CE 
00 x—k Me MI mM? M3 


oe A A 
M M3 
eX. 
Whence by substitution in (iv) 
pum = ME . . . . . e e (v) 


Hence we derive 
m = ta = M; 
po = o + ea = M+M; 
Hs = pra + Su + py = MP + 3M? + Mi; 
pa = hua + Ga) + Tua + pa = M* + 6M + 7M? + M. 
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The mean moments are obtainable from the zero moments by recourse to the now familiar 
formulae, so that 


m =M =m; Ma = oil 


1 
ar Bs = M and Bs = 3 -> e ‘ A $ ‘ Fi (vi) 

For the Poisson distribution we thus have the relation 
Ba = 3 + B: : : 5 2 : . (vii) 


The reader will recall (6.08 in Vol. I) that the Type III distribution is more leptokurtic for 
the same measure of skewness, since 


= 3 +3B, ; : : . (vit) 
By proceeding in the same way we derive, for the higher Pearson coefficients (p. 599), 
10 1 
Bs = 77 ya: f= b+ + a . . . e . . (1x) 
105 56 119 1 
o * yt pi = 108 + S + y + ae 


In Chapter 5 of Vol. I, we have seen that the Poisson approaches the normal distribution 
when M > 9. It is instructive to compare the Pearson coefficients of the normal with those 
of the Poisson distribution when M = 1, 10 and 100. 


B, Ba Bs Ba Bs Bs 
Poisson M=1 . < FO 4-0 11:0 41-0 162-0 715-0 
MSM + 3-1 1-01 17-51 11-06 155-191 
o Me . 0-01 3-01 0-1001 15-2501 . 1:0556 109-91 
Normal ; . 00 3:0 0:0 15-0 0-0 105-0 


(b) The Rectangular Distribution 


If the universe is n-fold with scores 1,2,3 .. . n, 
s=n q e) 


Mr = PI . . . . . ° (x) 


By (v) in 11.07 


Sy ADD A — DEY 
a=1 (k $ 1) k ES 1 ; 
De ae : 
. Pix) = (k $ 1) ; ; . . . , i (xi) 
Evidently, 
n+ 1 n? — 1 
Ma) = TOFA Ha = = 
DARE Ts eee PIA EAS genom: 
(n a= 2) _ {n 1)(n? — 5n + cid an 


Mis) = 4 T PU 3 
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From these, and by means of the next four of the series, we derive the following values 
for the ordinary zero and mean moments : 


_ (a+ ent 1) e alar. 
Pa 6 ri i 12 
n+ 1) 
pg = - m, = 0 
: (t+ Hrt MOr + on — 1) gos (n? — 1)(3n? — 7) 
n 30 a 240 
n(n + 1}(2n2? + 2n — 1 . 
p POE IND + Bn D a 
_ (n+ 1)(2n + 1)(3n* + 6n? — 3n + 1) ae (n? — 1)(3n* — 18n? + 31) 
A 42 io 1344 
n(n + 1)(3n* + 6n? — n? — An + 2) | 
a a ee 
(n+ 1)(2n+ 1)(5n8+ 15° 5n4*— 15n?— n?+-9n—3) _ (#—1)(Sn°—S9n*+-239n?—381) 
A 90 TE 11520 i 


Hence we derive the following exact expressions for Pearson coefficients of even order, all odd 
ones being of zero value : 


ea 9 Po oe 27 108 + 144 a 
* 5 5? — l= 7 MD 
72 1296 1728 a 
== a aa ye Pook ee — 
(c) A Binomial Distribution 
If the definitive binomial of the raw score (x) distribution is (p + q) 
Hay = 2 rope" . . . . . (xiv) 
a=0 


The derivation of an appropriate expression is easy by successive partial differentiation of the 
binomial series. ‘Thus 
y? a 02 
spied tmp = 5p ; 
oe 2a(x— ra g 7" = r(r — Mp + q)”. 
0 


More generally we thus derive 
ae e ye ES 
TZA - e +O 
ae Pe ae <a yy de ae * — y) : 
0 


A 
; (k) ere Ak) pk 
0 
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Alternatively, we may proceed as follows : 


Par i a x! 
a Bahk) __ x-k ee x 
V(x) P q x xl (r ee x)! p" P (x = k)! 
— lk) pe A kot — 
(r — x)! (x — k)! 
If we put u = (x — k) and v = (r — k) in the above so that hs — x) = (v — u) 
wer u= 
Bog Eyk) —- plk) u V— u 
2 rap q r P e> a SES q s 
Since u! is infinite when u is negative, all terms in the range u = — 1 to u = — k vanish, and 
Ea (X) wak S (k) pk 
Ly” Ex E lk EA: v 
2 Topa x)P q Er r Èe wie q 4 p (p ye q) ) 
e: > rx) P°Q T — E y E A ao, 
x=0 
TARO elo ; ; i ; ; y ; . (xv) 


Hence we derive the following expressions for the zero moments : 
po = TP + 1093 
ps = TP + Sr*p*q + rpalq — P); 
py = r'p* + 6ripiq + 1°p?q(7 — 11p) + rpq(1 — 6pg). 


The corresponding mean moments are 


M = Tp; 
ms = 1pq(q — p); 
m, = 3r°p?g? + rpq(l — 6pq) . ; ; . (xvi) 
We derive in the same way the higher even moments, e.g. 
mo = 1573934? + 5r°p°g(5 — 26pg) + rpq(1 — 30pq + 120p) . . (avi) 


EXERCISE 14.03 


1. Determine the first 4 factorial moments of the unit, 2-fold, 3-fold and 4-fold sample distribu- 
tions of an infinite universe of score values 1, 2, 3 (a) in the ratio 1:2:1; (b) in the ratio 1:4:1. 


2. From the foregoing results obtain the first four ordinary moments about the mean, and 
ßı and f, for each distribution. 


3. Repeat the foregoing exercises for score values of — 1, 0 and + 1 in the same ratios. Compare 
the results. 


1404 THE NORMAL DISTRIBUTION 


The only tabulated sample distribution we have used in Vol. I as a sufficiently satisfactory 
descriptive function in the sense defined in 14.01 is the normal. When we seek to establish in 
what sense the normal integral provides a satisfactory fit to an exactly definable distribution which 
involves recourse to laborious computation, it is important to be clear about the significance 
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level at which we deem correspondence to be satisfactory. Accordingly, we should remind 
ourselves that tables of the integral or ordinate of the normal distribution always refer to the 
standard score (c), i.e. the ratio of the deviation X = (x — M) of the score from its mean to 
the standard deviation of the score distribution : 


X X 


a . . . . . . (1) 


The variance of the standard score is by definition : 
2 
V= (=) sae Zo; 
da O 
2 
mn A 


This is why we speak of the distribution of the normal standard score as that of the normal variate 
of unit variance. When we speak of a level of significance in the same context we merely refer 
to the numerical value of the standard score. If our concern is to establish at what significance 
level the normal gives an adequate approximation of the sum of the terms of a binomial dis- 
tribution, we presuppose a criterion of adequacy (e.g. an error not exceeding 10 per cent.) and 
we must thus base our comparison on standard scores. 

For a distribution of raw-scores (x) from 0 to r defined by terms of the binomial (q + pY 
the variance is rpq and the mean is rp. In standard form the score is therefore 


pe Eb 
| Vrpq 
Thus we define standard scores of the binomial distributions below as follows : 
p r c 
1 (x — 8) 
Res 16 5 
2(x — 9) 
1 36 eS 
z 3/3 
1 (x — 5) 
3 25 5 
x — 10 
ita 400 To 


If our only object is to compare frequencies, no other precaution calls for comment. However, 
our main concern in statistics is with summation of frequencies; and we use the table of the 
normal integral defining the area under the curve from — oo (or zero) up to the assigned sig- 
nificance level. We then have to remember to make the half interval correction of (v) in 14.01, 
as explained more fully in Chapter 3 of Vol. I (p. 112). If the score increases by unit steps 
(Ax = 1), the boundaries of the base of the histogram enclosing an area which corresponds 
to the sum of frequencies of score deviations from X = a up to X = b are respectively at (a — 4) 
and (b + 3). Thus the appropriate X co-ordinates of the fitting curve which approximately 
goes through the midpoint of the uppermost margin of the histogram columns are for X and 
— X respectively (X + 4) and — (X + 4). More generally, if the score increases by Ax the 
boundaries of the base of the histogram enclosing an area which corresponds to the sum of 
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score deviations from X = a up to X = b are at (a — 3Ax) and (b + $Ax). The appropriate 
X co-ordinates of the fitting curve for X and — X respectively are then (X + $Ax) and 
— (X + jAx). 

To express our score in standard form referable to a mean M and variance o? with due 
regard to this correction, we must distinguish three cases : 


(i) to sum frequencies from — X to œ or from — œ to — X, we write 
= M+ 44s), 
Se n= Eh ee 9 


10 
(ii) to sum frequencies from — œ to X or from X to œ 
(x — M + $Ax) 
oS Tt 
(iii) to sum frequencies in the range + X, we write 
E + (x — M + 44x) 
(0 


Score-sum Distribution of Unit Variance. In Chapter 16 we shall have to make extensive 
use of a property of score distributions so defined that the variance is unity ; and we may 
appropriately dispose of it in this context. Suppose a player records as his score w the sum of 


the individual results x,, x, . . . x, of single trials from n identical dice weighted by a particular 
constant (a, b . . . n) which need not be positive, i.e. 
Wa w+ Oy OR. ke NM 


This is, of course, equivalent to renumbering the score faces by an appropriate scalar change 
Zp = A. Xa, Sy = 0. Xp, so that 
WS S$ By By ss Bye 
The effect of this scalar change is, of course, 
Vie) = AVE) > Via) = FV (x,), eto. 
In virtue of independence, we may write the variance of the score-sum distribution as 
Fem Ve) +e FG > + VS) 
=a Vx) + PURA). Va). 
If the dice are identical V(x,) = V(x,) = V(x,), etc. = Vz, and 
VERA a NA : i ; . (ii) 
In this expression V, is the variance of the unit sample (single toss) distribution of the die, and 
we make the variance of the player’s score distribution identical with it if the rule of the game 


prescribes that | 
a+ Pe eto. Pe 1 j A ; AO 
If we vary the rule of the game by prescribing that the player records the weighted average of 
the standard scores of the n trials | 
A AS T 


It follows from (111) that 
Vi=a4+bB4.. om i i + e sd 
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Thus V „is then itself a score of unit variance, if 
Pio. PA ; ; i . (vi) 
For example, V „ = 1 if the rule of the game prescribes either of the following systems of scoring 
W = fea + $0) + Fle + 345 


1 
w= de, — $C, — —=Cy. 
V2 
Standard Score and Scale. Since moments of a distribution depend on the scale of the score 
increment, we can make them comparable with those of the normal curve of unit variance only 
if we choose our scale accordingly. If a discrete score g = kx, g increases by increments 


Az = kAx, and M, = kM,; but V, = kV , and o, = ko,, whence 
z— M, kœ — M) x—M, 


O, ko, ne 


Example 1.—For a 2-class universe p = 0-1 and q = 0-9. Find the standard score corresponding 
to a raw score x = 15 and proportionate score u = 0-15 referable to a replacement sample of 100. 
The definitive binomial of the 100-fold sample is (325 + 335)! so that V, = rpq = 9 and o, = 3, 


the mean raw-score is M, = 10. Hence the standardised raw-score is ¿(15 — 10) = 3. The mean 


proportionate score M,, = 0-1, and V,, = i we eae whence o, = 0:03. The standardised propor- 
Y , 
tionate score 1s 
045 Dl: 3 
i ea 


~ Example 2.—Two tetrahedral dice have face scores: (a) 1, 2, 2,3; (b) 3, 5, 9, 7. Why are the 
standard scores respectively corresponding to score sums of 7 and 17 in a 3-fold toss equivalent ? 

The distribution of the 3-fold toss accords with successive terms of (4 + 4 + 4)? = (4 + 3)", the 
general term of which is 27°. 6) This corresponds to scores of: (a) 3 + x, (b) 9 + 2x. For the 
first distribution, the scale is unity (Ax = 1) and for the second, Ax = 2. The variances are 6(4) = 2 
and 22,2 = 6. The means are 6 and 15 respectively and the standard scores are therefore 

O o UA 


Note that the scores have the same frequency since x = 4 in the general term of the binomial if: 
(a) 3 +x=7; (5) 9 + 2x = 17. 

Thus the effect of standardising the score is to make both the scale and the origin of the 
distribution irrelevant. This circumstance throws light on the build-up of Pearson's moment 
coefficients of skewness and kurtosis (flatness), which we define so that they do not depend on 
the scale of the distribution. For any moment of a standard score distribution we may write 


Thus Pearson’s coefficient of kurtosis (see Chapter 6, Vol. I) is simply the 4th moment of the 
standard score distribution, 1.e. 


B= Tem A 
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More generally, we define higher Pearson moments of even order as even moments of the 
standard score distribution, e.g. 


Me Ms ES 
= —=m,(c); Be = = = m(c . (vill 
Ba = Th = mele); Pa = Th = malo) (viii) 
Pearson’s first coefficient of odd order is the square of the third moment of the standard score 
distribution, being 
ma A 
== — = m C . 


Squaring makes it irrelevant whether the mode lies right or left of the mean. ‘The definition 
of the next two coefficients of odd order is 


By = ae = male) mle); Bs =e = mse) mie). x) 


In the light of previous remarks, we thus see that Pearson’s coefficients embody a method 
of comparison of distributions eliminating distortion arising from difference of scale. The 
symbolism he introduced is confusing for more than one reason. It would be preferable 
to label coefficients of even order so that the subscript would be that of the corresponding 
moment of the standard score distribution; and the definition of coefficients of odd order 
is doubly exceptionable. By defining f, as the square of the third moment of the standard 
score distribution we eliminate the sign of the skewness, so that one distribution otherwise 
identical with another may be its mirror image; and the definition of higher coefficients of 
odd order has a disadvantage aside from the fact that it is inconsistent with the pattern of the 
coefficients of even order. Though they must all vanish if the distribution is symmetrical, 
the converse is not true. They must vanish if m, = 0, and this is consistent with the possi- 
bility that the distribution is not symmetrical, since higher moments of odd order need not 
then vanish. 

For the normal distribution it suffices to define the values of Pearson’s coefficients of even 
order, since all moments of odd. order must vanish in virtue of symmetry. We shall therefore 
write 


1 00 
Rulo == De =| AE 
a i V lat) -o 
Since the c-distribution is symmetrical 


oO 


2 
Mokc) = | a ces de, 


0 


To evaluate this integral, we now substitute c? = C so that 
dc 


de = Fp dC = 4C-* dC, 


+, mko) e -CP-# dC 
i V 


ar 
de | gl. C&+H-1 AC : ; . (x) 
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The integral in the last expression is a Gamma function, whose value is by (xi) in 6.05 (Vol. I) 


IR + ) | 
(PR > 
2T(R + 4) 
or M5;(C) = ae ee : é , : Š i) 
We have already seen (p. 249, Vol. I) that : 
Pa) = HG); r23) = 3-313); 
t 5 


T(85)=3.2- 10); M43) =2.3.3. aT. 
In general therefore 


1.3.3 2% a 
ray eS 
By substitution in (xi) we thus derive 

mao = 1.3.5-... . (2k — 1). 
By setting k = 1, 2, etc. in the above we therefore derive 
milo = mula = 3; mia 10 Ano == 109. 
Hence from (vii) and (viii) 
Bm ds Pores Bye lO. = i s ; . (xi) 
The Square Normal Standard Score. We may here pause to notice that the foregoing 
examination of the moments of the distribution of the normal score of unit variance leads at once 
to the derivation of the zero moments of the distribution of its square (C = c?). By definition 
the Ath zero moment of the C-distribution is 
uilc) = E(C*) = Ec"), 
whe) sa Mag lO). ; ; ; ; . (xii) 
~ w(C)=1; pAC)=3; pC) = 15; pu(C) = 105, ete. 
We shall have occasion to examine the meaning of the results last stated more fully in 15.02. 
Meanwhile, the reader may with profit recall the Type III variate (p. 257, Vol. I) whose p.d. 
equation is 
(3)? 
fO = Tay de Bs ; i . i . (xiv) 
2 


The kth zero moment of the above distribution is 


DP -c-a 
| eS, e C . CAC, 


. pix(C) = i E ame De TRE 


The above is identical with (x), whence 
palC) == Malt). 


pal C) = parke?) . . . . . . (xv) 
- Thus (xiv) defines the distribution whose moments of any order are identical with those of the 
distribution of a square normal standard score, as we shall establish by a different procedure 
in 15.02 below. 


We may therefore write 
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M.G.F. of the Normal Distribution. From (vii)-(viii) above, we easily obtain the moments 
of the normal distribution when the variance is not unity. We may express (vii)-(viii) and (xii) 
in the form | | 


m 
Ba. ane. fy = (2k + DOOR D0k-=3)...5.20L 
Whence by putting k = 0, 1, 2, etc. we have the following results : 

MS MR VS: ome oe r: mm, = 10873 ; j . (xvi) 


We shall later make use of the moment generating function to derive certain useful properties 
of normal distributions. Before we derive its form, we may show that the following function 
does in fact generate the moments defined by (xvi) : 

G, = er" . . . . : (291) 
By expansion in the usual way | 
ea Pa E ES E is VA 
== —— oer anne, ap Beton e samen) a COG 
A OSO 6.4 328 
Whence we obtain 


V?2p3 vays PS ee 


eS a A > 
D,G, = Vh 4 5 3 28 ggg to; 
3122 5V9p% 7V4h 
2 ER . 
DG, =V + 5 e a Fo Ei 
3 45 
DÌG, = 3V?h + ih de =~ etc. ; 
E 3/,2 44 
DG, = 3V? + AS HS 
2 8 
4h3 
DG: = 15V%A + sa A i ae 
2 
DG, = 15V? + =" ROS 


Whence we have 
(Dr . Gujnzo = 9; (D; - Gu) = V ; (Di - Gulnzo = 9; 
(ee 377 eG 0 (1%. Gig = r; 
OG eg IO PANA FOS E etc, 


The derivation of (xvii) is as follows. By definition, 


1 TE ee 
G, = =| e Y eth dx 


V2r VJ — 
pp (X? — 2VhX + Vh?) pa 
eines ee aM ay 
var. exp JA e" A 


A le X — Vh} 
w cap pr dX. 
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If we put (X — Vh) = 2, so that dz = dX 


er Vie BA o 
V2nV Fe 3 
In this expression 
V2nV I -o 
n. G, = EF., 


Thus the m.g.f. of the score-sum (and hence of the mean score with appropriate change of scale) 
of r unit samples taken from the same normal universe is 
' = evn’ 
id 


This is the m.g.f. of a normal distribution of variance rV. That of the mean score is the m.g.f. 
of a normal distribution of variance rV — r? = V —r. The m.g.f. of the score-sum of two 
samples from normal universes with u.s.d. variance V, and V, is 

AVE AV IA _ HV a+ Vh 


This is the m.g.f. of a normal distribution of variance (V, + V»). In virtue of symmetry 
(11.08, p. 473) it is also that of the distribution of the score difference. We may sum up these 
results as follows : 


(i) If the u.s.d. of a universe is normal with variance V, the distribution of the score-sum 
of the a-fold sample is normal with variance aV and that of the mean score is normal 
with variance V — a; 

(ii) the distribution of the sum and difference of a-fold and b-fold samples from the same 
normal universe is normal with variance (a + b)V = V, + Vo. 


1405 MOMENTS OF THE DISTRIBUTION OF THE MEAN 


In Chapter 7 of Vol. I we have seen that it is always possible to express in the form V; = t. V, 
the variance (V;) of the random t-fold sample distribution of the score-sum in terms of the 
variance V,, of the unit sample distribution. We shall now employ results obtained in 11.02- 
11.03 to determine the value of higher mean moments of the distribution of the score-sum (and 
hence of the mean score) of t-fold samples in terms of the mean moments of the unit sampling 
distribution of the parent universe; and hence to establish an important conclusion with refer- 
ence to the distribution of sample means, already foreshadowed in 14.02. Our assumptions 
are: (i) that the samples are random, i.e. that choice of one item is ¿ndependent of that of another ; 
(ii) that none of the moments concerned is infinite. The last condition is true of any dis- 
tribution of discrete scores. In virtue of (i), the relevant formulae are those derived by 
application of the product rule. So we can derive the ensuing results by recourse to generating 
functions, as indicated in 14.02. Here we shall do so by a more elementary procedure. 

It will be convenient to denote the kth mean moment of the unit sampling distribution by 
m, and that of the t-fold score-sum by m(t). We now recall definitions given in 11.03, and write 
the 3rd moment of the distribution of the score-sum of the (a + b)-fold sample in terms of those 
of the a-fold and b-fold samples as 


mala + b) = Ela + b — M, + M,) = Ela — M, + b — M,y 
= E(a — M,} + 3E(a — M,)(b — M,) + 3E(a — M,)(b — M,)? + E(b — M.) 
= ma) + 3m,(a) . m,(b) + 3m,(a) . m,(5) + ms(b). 
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Since the first mean moment of any distribution is zero 
m;(a + b) = m,(a) + m,(b), 
+, m(t + 1) = mit) + ma. 
Whence we have m,(2) = 2m3, m,(3) = 3m;, and in general 
MAÁE) = 1. ty 
In the same way we may write 
mit + 1) = m,(t) + 4m,(t) . m, + 6m,(t) . m, + 4m,(t) . m + m, 
= m,(t) + 6mf(t) . ma + Ma. 


Whence we derive 
m,(2) = 2m, + 6m? ; m,(3) = 3m, + 18m; 


m,(4) = 4m, + 36m2 ; m,(5) = 5m, + 60m. 


And in general 
m(t) = t . m, + 3t?’ m3. 


Similarly we may derive 
m;(t + 1) = ms(t) + 10m,(t) . ma + 10ma(t) . mg + ms. 
Whence by iteration, we obtain 7 
m;(2) = 2m; + 20m, . ms; m;(3) = 3m; + 60m, . ms; 
m;(4) = 4m; + 120m, . mz ; m;(5) = 5m; + 200m, . ms. 


And in general 
m;(t) E Ms — 10% (2 Ms . Mo. 


We may derive similar expressions involving figurate number coefficients for higher mean 
moments in the same way, and may tabulate the first eight as follows : 


m(t) = 0 

AA n O 
A Sa A A A 
po a A O. MP a oe. GD 
mit) =1.m + 10%, mo . ; : ; : ; í ; . < 
Ml = t . mg + 15m, . Ma + 10i? m2 + 158 m3. oy 
m(t) = t . m, + 21t@m,. ma + 352 m. m + 105 m3, .mé . ; i i . (vi) 


m(t) = t . m + 28m, . ma + 56m, . ma + 35t'? mi 
+ 210t'? my . m2 + 2809 m2 . m, + 105% m2 (vii) 


To determine the mean moments of the corresponding mean score or proportionate score, 
it is merely necessary to make the appropriate scalar change, ws. : 


A i 
na) = E(X). 


The scalar factor a* cancels out in the denominator and numerator of the Pearson f-coefficients. 
So Pearson coefficients of the same order are identical for score-sum and score-mean or pro- 
portionate-score distributions ; and we now have all the data for expressing Pearson coefficients 
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of the score-sum or mean-score distribution in terms of those of the distribution of the unit 
sample. We shall write the Pearson coefficient of order n for the mean-score or score-sum dis- 
tribution of the ¢-fold sample as ¿Bn and that of the unit sample as n. By definition, the 
values for odd and even order are respectively : 


Odd 
B = Mas+1 8 at M(t) . Mos y1(t) 
28—] ~~ mst? 1... (Fr 2s-2 ms+*(t) 
Even i 
Ma (t) 


ms(t) 


Bas-2 = = ) (mPas-2 5 


By recourse to (1)-(vii) above, we therefore derive 


Br = E a aa Bi , 


ho age ee tna 
In this way we may proceed to tabulate results for the first 6 Pearson coefficients as 

apes B: A a a a 
noan O a e e S e 
ips = -o ee ee A 
Ae TETE bi 
ele gee A 300 
en 280(t — : 1),  56¢ — DB w Bs , 210(¢ — DA — 3) 

y 35(¢ = Ds = 3)? | 28(¢ — ve. = 15) , (Bs at (ii) 


We may thus say that ¿»fı (+83, (#8; all approach zero for large values of ¢ and 
(Ps, ba ße respectively approach limits of 3, 15 and 105. On the assumption that 
none of the moments involved is infinite, the Pearson f coefficients of any order w.r.t. the 
distribution of the score-sum or mean score of random t-fold samples from any universe there- 
fore approach limiting values identical with those of the corresponding £ coefficients of the normal 
distribution. Accordingly, a normal distribution is suitable to describe the distribution of the 
t-fold sample mean value of any score regardless of its own unit sampling distribution, if the 
sample size ¢ is sufficiently large, and the moments of the unit sample distribution are all finite 
(or zero). 

Certain distributions already dealt with in Vol. I will provide occasion for illustration of 
the implications of the foregoing theorem. 
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(a) Poisson Universe 


That of the unit sample is a highly skew distribution which is also leptokurtic (B > 3), 
i.e. steeper in the region of the mode than the normal. For successive values in the range 
x=0,1,2..., the frequency equation is 
Xy 
a > 


e == 


The skewness depends on the single parameter in the above, i.e. the mean value (M) of the 
score x. From the results already touched on in Chapter 6 of Vol. I, and set forth more 
fully in (x) of 14.03, we obtain the following values for the Pearson coefficients, when M = 1: 


cP =1; Be =4; Ês =11; Pa = 4l. 


By recourse to the preceding theorem, we obtain for the distribution of the score mean of 10-fold 
and 20-fold samples, 


Sample size Bi Ba B3 Ba 
10 0:10 3°10 1-01 17:51 
20 0:05 3:05 0:5025 16:16 
Normal 0-0 3:0 0-0 15-0 


(b) Rectangular Universe 
For an n-fold universe of score values 1, 2,3... n, the expression for the score-frequency of 
the unit sample is 


j=. 
n 


The odd mean moments are all of zero value, as demonstrated on p. 594, and 


oe, RIA 
UPA 5 (n? ee 1) ”--E>=" 7 7(n? eae 1) 7(n? ae 1) ) 
73 1296 1728. 


(ihe Eee 5(n2? — 1)? 5(n2 — 1)% 


For the 6-fold (n = 6) rectangular universe of the ordinary cubical die we therefore have 
(Ba = 1-731 ) Pa = 3-422 ; Be = ao. 


For the mean score distribution of the t-fold toss, we obtain from the foregoing theorem, 


No. of tosses Br Ba Bs Ba Bs Bo 
6 0 2:789 0 12-036 0 65:103 
12 0 2:895 0 13-466 0 84-586 
18 0 2:930 0 13:966 0 91-002 
Normal 0 3:0 0 15:0 0 105:0 


(c) A Binomial Universe 


The determination of the higher moments of the Binomial distribution, though elementary, 
is somewhat laborious by more usual methods. The relations established above permit us to 
compute both moments and Pearson coefficients by a shorter route. ‘The procedure depends 
on a theorem established in Chapter 7 of Vol. I, viz. : if successive terms of the binomial (p + q)* 
define the unit sample score distribution of an (a +:1)-fold universe, successive terms of the 
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binomial (p + q)” define the random distribution of the score-sum or mean score of the t-fold 
sample therefrom. If we make a = 1, denoting by (my the Rth mean moment of the dis- 
tribution whose definitive binomial is (p + q)!, the corresponding moment my is that of the 
distribution of the t-fold sample score in the taxonomic domain of a binary universe. For 
its unit sample the mean is then p and the distribution is simply 


Raw Score (x) . ea. 1 
Score Deviation . . —P (l—p)=q 
Frequency (y) . ; j SAES. | P 


Thus the kth mean moment of the unit sample distribution is 
m, = q(— p) + pq”. 
Whence we derive | 
Ms = gp’ + pq = pap +4) =P9 5 
Ma = PF AP" = PLY =P) = PE Pee <p PS 
m, = gp" + pq = palp* + 9°) = palp” — pa + 9°) = pall — 3pg) ; 
ms = py — gp = pala" — p*) = pala” PNE + p*) = pala — pP — 2pg) ; 
ms = gp’ + pq’ = pall — Sp*q — 10p*g* — 10p*q? — Spq*) = pal — Spg + Sp*g”) ; 
m, = py’ — gp’ = pala” — p*) = pala” — Pg? + P*) = pala — PA — pg) — Spg) + 
ma = qp* + pg? = pa{l — 7pq(1 — 2pq + pe). 
We may write the general formulae for the Pearson coefficients as follows : 


$ a M3. Moki SN (q ES pq? Ae ra i 
e o IA 
(1)P 2k-1 mir? p*g* 


k- A 

8 Mer = oes q? 1 

iri O ket eS 
Ma P yq 


Hence we get 


nee 2 1 E 3 
ofi = aoe Be = oi 
p, = CA, ,, | 
oe pa ae pge ’ 
se. E 
(Ps za =n = 
We thus obtain 
eae le 
Br pa g ; 
l iG 
b2 = 3 + a 
o, — (LE pg(l0e — 12) — P). 
1 — 30pq(1 — 4 Stpq(5 — 26 
(By = 15 + et ee ae 


ep" 
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es le NC Ts ON E loa. er, 
=: - 
A... IS 
1 — 7pq(1 — 2pq + 16p?g? 
E -o Pg gat pg”) 
7(t — 1) 
T Pp 


Pi = 0; mb2=1; 
wbs = 0; wpa = L; 
wbs = 0; Pe = 1. 


In Chapter 3 of Vol. I we have seen that the normal is a good fitting curve at the 2o level for the 
discrete distributions whose definitive binomials are respectively 


(a) (+2); (b) Go + To)". 
From the above values for the Pearson coefficients of the 2-class distribution defined by the 


unit binomial (4 + 4}, we obtain the following results for the distributions of unit and 10-fold 
sample mean scores, when the binomial definitive of a universe of 17 score classes is (4 + 4)**. 


[17 — 238pq + 704p%q? + 10tpq(7 — 34pq)]. 
When pb =3= q, 


Sample size Br Ba Ps Ba 
1 0 2:875 0 13-2 
10 0 2:988 0 14-813 


For a universe of 101 score classes specified by successive terms of the binomial (y + #5), 
we obtain 


Br Pa Ba Ba 
E 0:07 3:05 0-71 16-469 
10 0-007 3-005 0:071 15-148 


It is evident from these results that the distribution of the mean of a sample as small as 10 
closely conforms to the normal pattern for a discrete distribution as flat as may be, and for a 
relatively steep and skew unimodal distribution. It is also noteworthy that the range of the unit 
sampling distribution consistent with this assertion may be very restricted. From that viewpoint, 
the following situation is instructive. 


(d) The “ Burette” Universe 


The term “burette ” universe here signifies one of a type of situations which not 
uncommonly arise in the laboratory, when repeated estimations ring the changes on only 3 
consecutive scale divisions consonant with competent workmanship. We thus suppose that the 
unit sampling distribution involves only 3 score values, which we may label — 1, O and + 1, 
if the scores run consecutively. We shall first suppose that we obtain the central score twice as 
often as otherwise so that the specification of the symmetrical unit sample distribution is 


Score i A . — 1 0 +1 
Frequency . À eee g E 


Evidently all Pearson coefficients of odd order have zero value. As an exercise the reader may 
check the following : 


Be =3 ; Pa = ) Ps E A 
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3-FOLD SAMPLE DISTRIBUTION FROM A SYMMETRICAL BURETTE UNIVERSE 


OL om lllh 


Po “ P, Pa P3 Pa Po Po P- P, Pa P3 Pa Ps Ps P7 

Pa- (P +P2+ Pa) = 572 Pa - (P, +P2+P3) =- 5500 
| 

P= a P, P> P3 P, PS Ps PL PL = 3 P, Pa P3 Pa P5 Ps Pz 
GAS HOS BEES TED CASO HOS OOD ee 

R= 3 Pi. Po Pa E Ps fos Py | Pi Po Pz Py Ps P Pz 
Hr 2 BE SRE T th Y + 2 # £5 
P, = (p,+ p,+ Pa) = dese R (P, = Pa +P) E - 325 

R= 3 P, P2 P3 P Ps P P P, Py Py A Sew ee 
FALDAS Sp 8 & E 
P,— (P, +P, + P3) : > P, = (P+ Po + P3) =- 235 


Fic. 107. 2-Fold Sample Distributions from Symmetrical Burette Universes. 


a ae E 
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2-FOLD SAMPLE DISTRIBUTION FROM A SYMMETRICAL BURETTE UNIVERSE 


hos 


Pi Po P3 Pa Ps eek Py Po Pz Pa Ps 
RABA ot BBS 
P3 -2P + P= | P3- (Py + Po) = 305 
de Cako 
Py Po P3 Pa Ps ss ae, Pi P2 P3 Pa Ps 
% SBS % ee a 53555 
y ls bec dead P3-‘Py+P2)= 35 
on olla 
dede A h Pi Po P3 Pa Ps 
333 ae $ á a sa da ta i 
P3-(Pi + Pod $ P3- (Pi +Pa)™ raz 


De 


P, Pa P3 Pa Ps P, Po P3 P4 Ps 

ge E r A. n a 1 1 

725 % 3 x0. 7.9 3 

P3-(P,+P2) =O i P3-(P,+P2)= £3 P3-2(p,+P.)=O 
4 


Fic. 108. 3-Fold Sample Distributions from Symmetrical Burette Universes. 
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Whence we obtain for the 10-fold sample, the following remarkable correspondence with the 
normal values : 


ao b2 =3; 1081 = 14-945; aope = 103:41. 
For the 3-class universe with unit scale we may write the u.s.d. more generally as below : 
Score . : | 0 +1 
Frequency. i (1 — pa — Pe) Pe 
If the distribution is symmetrical (p, = p.) the mean is zero, and 
Mies Oi Mirar = OP: 
Thus odd order Pearson coefficients vanish and 
Bs = (2pa)*; Ba = (2pa) >; Be = (2pa) >. 


For symmetrical distributions we thus derive results shown in Table 3 which brings into focus 
the issue dealt with in 14.07 : how nearly must the numerical values of the coefficients correspond 
with those of the normal to guarantee a good normal fit ? 


TABLE 3 
| 
Definitive : 
peinte | (a +at+ey | +343) G+H (+843) (de +385 + do) 
Sample Size 
yi e gee es Ba Br Ea A ee i ie Be 
Unit 1:25 1°56 1:95 1:5 225 3°38 2:0 4:0 8:0 3 9:0 27°0 5:0 25-0 125-0 
2 2:13 5:08 12-54 2:25 6:19 18-14 2:5 8:5 39:5 3 13:5 74:25 4-0 25-0 212°5 
4 2:56 9-24 39:09 2:63 9:93 46:07 2:10 10:4 54-2 3 14:3 95:91 3-5 21-25 184-06 
6 2:71 10:98 55:76 2°75 11:52 61°61 2°87 12:61 792 3 14-83 100-75 3°33 19-4 163-43 
10 2:83 12°50 72:71 2:85 12:85 76:88 2:9 13:54 85:94 3 14:94 103-41 3°25 17:8 142-82 
20 2:91 13:79 87:76 2:93 13-90 90:11 2:95 14-26 95:00 3 15:0 104-59 3-1 16:45 124-95 
40 2:96 14:35 96:10 2:96 14-44 98:14 2:98 14:99 99:88 3 15:0 104-90 3:05 15:74 115:24 


When the distribution is not symmetrical, we may write the moments about zero in the 
following form by means of the substitution p, = hp, : 


Porta =D + pal— DST = palh — 1); 
Mon +2 = PALTA + pal— 1" ** = pal + 1). 
In virtue of the identity of odd zero moments and of even zero moments, the expressions for mean 
moments involve only m = palh — 1) and ua = palh + 1), thus: 
My = Pg — 143 
Ms = pg — Sepa + 243 = pull — 3p2 + 2) ; 
m, = pa — Asp + Bupi — Sut = pal + Bui) — pa(4 + 345). 
Thus we derive 
g, — G+ DU + 65 — 19 — pall = WA + 3 — 197 
palh + 1) — 2pq(h? — 14 — 1) + palh — 1. 


The reader may derive other coefficients as an exercise. 
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EXERCISE 14.05 
1. If u, is the kth zero moment of the unit sample distribution and p,(7r) is that of the r-fold sample 
distribution with replacement, show that 
pr) = Tug + 1: ; 
polr) = Tus + 31 uiua + 1 pi; 
pu(r) = Tua + 47 pps + 313 + Er pipa + Ons. 


2. Use the results of 1 to show that the first four zero moments of the distribution whose definitive 
binomial is (q + p)" are 
=p; pa = Tp + rP; 
pag = 2p + 37@)p? + r©p3; y, = rp + Trp? + 6r8)p3 + rpt. 


3. Find the first eight mean moments of the raw-score distribution whose definitive binomial is 


(q + py. 


4. For the 3-class distribution of scores — 1, 0 and + 1 with frequencies p,, p, and p, give the 
values of f,, B», B3 and f, for the r-fold sample mean-score distribution. 


5. Tabulate the values of £4, B,, B3 and f, for the distribution of Exercise 4 when r = 5 andr = 10 
for pa = 4 and p, = 1. 


1406 MOMENTS OF A DIFFERENCE DISTRIBUTION 


In a preliminary way, we have examined in Chapter 6 of Vol I what moment-fitting curve 
is appropriate to describe the distribution of the difference between 2 independent binomial 
variates. In that context our concern was with only the first two Pearson coefficients. We 
shall now take up the same issue from the more general viewpoint of 11.06. 

Before doing so, it is appropriate to clarify an issue we have not as yet dealt with. When 
our concern is to explore the null hypothesis that 2 samples come from one and the same binary 
universe, two courses are open in accordance with the schema below : 


Sample A Sample B Total 


No. of Successes 
No. of Failures 


Total 


In this set-up, our estimate of the proportion of successes in the putative common universe 
of the null hypothesis is 


a Xa + Xp 
5 N 
The estimated mean numbers of successes for the two samples are therefore 
Muh Wa Mirb h 
Accordingly, the estimated mean value of the raw-score difference (x, — x,) = D is 


Mp = (a — dp, 
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The estimated variances for samples of large size are 
V= apl =p) md Vy = 6. phl Po) 

Accordingly, the estimated variance of the difference (x, — x») = D is 

Vp = (a+ bip! — po) =N . pol — Po). 
Whence the square standard raw-score difference is 
E 4(b . Xa — a. xp)? 
. N . p1 — po) N(xa + %5)(N — Xa — Xp) 


For the corresponding proportionate score difference we may write 


V _ PAl — $o) , Pal = Pio) es N- pal Be) 
y a, b ab 


Whence we may write for the square standard proportionate score difference 
a N(b . xa — a. Xp)? 
P  ab(x, + %)(N — Xa — Xp) 
Accordingly, and if a = Kb, we derive 
AO le 
cĉ ab 4K  ' 


(1) 


(iii) 


The expression on the right is a minimum when K = 1, i.e. when the samples are equal, 
in which case c2 = c2. Otherwise, c2 > c?. If therefore the distribution of the score difference 
is approximately normal in either case, the proportionate score difference will give the higher 


assessment of odds against the null hypothesis unless a = b. 


In this context it is not inappropriate to refer to a common misconception. Many statistical text- 
books advocate the so-called Chi-Square test for assessing the credentials of the null hypothesis under 
discussion. The fact is that the assessment of odds w.r.t. the proportionate-score difference as pre- 
scribed above is exactly the same as by the Chi-Square test for 1 d.f., Chi-Square in this context being 
the square standard score. For the performance of the latter test, x being the cell score, we define Chi- 
Square (C) in terms of the cell score x and the observed mean (M,) of its four values xa, (a — Xa), 


Xo» (b — Xp) as M,) 
x— M,) 
ae ee ei 


x 
Thus we write 


a ae a 


ace a(x, + 2) A aie 
N N 
| — Xa) — a e = = 2! 0 6) a x — =| 
a a(N — x, — *p) a b(N— x, — *p) 


N N 
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= (bx, — ax) 4 (bx, — ax) (bx, — axp)? (bx, — ax,)* 
= aN(xa + %2)  bN(x, + %,)  aNN—x,—x,)) ¿NN — xa — xp) 


AO | (a + b) (a + b) | 
E N ab(x, + X) ab(N — x, — xp) 


ni N(bx, — ax)? 
E abla, + X(N — Xa — xp) 


The last expression is identical with (ii) ; and it is obviously immaterial whether we care to consult 
the table of Chi-Square for 1 d.f. in order to assess the significance of the square standard score or that 
of the normal integral to assess that of the standard score itself. ‘The two tests must necessarily give the 
same result ; but will not give the same result as the normal test for the raw-score difference. The pro- 
cedure last mentioned will in fact give a more conservative estimate of the odds against the null hypothesis 
unless the size of the 2 samples is the same. 


In Chapter 4 of Vol. I we have seen reason to suppose that the raw-score difference dis- 
tribution tends to normality for small values of r, and r, more closely than that of the proportionate- 
score difference, which has peculiar features for co-prime samples (Chapter 4, Vol. I). Con- 
sequently, the relative advantage of a test based on the latter procedure is not clear-cut without 
further investigation of the approach of the two distributions to the normal. We may explore 
this by the method of 11.06 without confining our attention to the binomial case. 

We first recall the fact that the difference between a score difference and its mean value 
is the difference between the two score deviations, i.e. 


E(x, — xp) = E(x.) — Elx5), 
*, (Xa — Xp) — Elx, — xo) = [Xa — E(x,)] — [x — E(x,)], 
*, (Xa — xo) — E(x, — x,) = Xa — Xp. 


We therefore write the Ath mean moments of the raw-score and proportionate-score difference 
distributions as below : 


Raw-score difference : 


A A 
Proportionate-score difference : | 
a E iy 
m,(d) = (= mrt i i : o (v) 


As in 13.05 we shall write the kth mean moment of the score-sum distribution of the 
t-fold sample as my(t), and that of the unit sample distribution as m, using the results of 14.05 
to express the former in terms of the latter. For the raw-score difference distribution we 
then have 


m(D) = m(a) + m(b) = (a + b)m, ; 

m3(D) = mya) — m,(b) = (a — b)ms ; 

m,(D) = m,(a) + 6m,(a) . m(b) + m,(b) 
= (a + b)m, + 3[a® + 2ab + bm? 
= (a + b)m, + 3(a +b)? mi. 
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Assuming in what follows (as in 13.05) that no moments are infinite, we therefore obtain 


BD) = t = a meee 
p(D) = E + 3 oe. (yi) 
If S = (a + b), we have shown in 13.05 that 
pS) =p BS) = 3 +2 
BAD) EE =p A 


BS) (a+ bP 
When the two samples are of equal size, 8,(D) is zero, and in any case (D) < B,(S) if 
B(S) > O. In general, therefore 


(a) the value of 8, for the distribution of the raw-score difference (a — b) is identical with 
that of the distribution of the raw-score sum (a + b), rapidly approaching the normal 
limiting value (a = 3) as (a + b) becomes larger ; 

(b) the value of f, for the distribution of the raw-score difference is nearer to the normal 
limiting value of zero than that of the distribution of the raw-score sum unless the latter 
distribution is also symmetrical. 


For the proportionate-score difference distribution, we may write in accordance with (v) : 


mata) y mill) _ ms m, (a+ bms 


o bb a - b ab 
ma) mb) m, m, (a? — b?%)mz 
m=- a e 
m m 6m (a). mb) | m,(d) 
MA a 
om, 3a®. m, 6m, m, 3b. ms 
Ss. a* ab b? b+ 
aè + 58 (a + by? 
= u 4 — 3m 5) + 3— 2b E A 
We thus derive 
=b U ; 
Bi(d) = aba + b) a+ by =e P(S) . . . =) (ix) 
al (a? —a AS 
Bd) = 3 +- ab Poi b) (Ba — 3) . . . Ex) 


To get the above into clearer perspective, we may again write a = Kb on the assumption 
that a is the larger sample, whence K > 1 and 
Bild) _ (K — 1} 
B,(S) K 


(xi) 
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B{d)—-3 K?*—K-+1 1 as 
re a K ER D+Z - A) 

From (ix) we see that 8,(d) must be zero as for the normal distribution if the samples are of 
equal size (a = band K = 1); but it is not otherwise necessarily less than P,(S). The expression 
on the right of (xi) exceeds unity when K ~2-6. This means that the proportionate-score 
difference distribution w.r.t. a-fold and b-fold samples will be less skew than the score-sum 
distribution of the (a + 6)-fold sample, if the ratio of the two samples does not appreciably 
exceed 2:6. Otherwise, it is more skew. The expression on the right of (xii) always exceeds 
unity, if K > 1, so that the divergence of a(d) from the normal value will always be greater 
than that of Ba(S) and hence greater than that of 8,(D). By comparison of (ix) and (vi) we see 
that 


Bd) _ (K +1} 
pHa E 


1 


Unless K = 1 (when both distributions are symmetrical), this means that the proportionate- 
score difference distribution is more skew than that of the raw-score. It is evident that B, and 
8, for both distributions rapidly approach the normal values of 0 and 3 as b, the size of the 
smaller sample, becomes large. For we may write them in the form : 


(A — 1) 


B,D) “TK + Fo . PB, and £,(d) A 
E ie eres rane 
BAD) — 3 ae =e and o A AR De ). 


We may investigate higher Pearson coefficients of the two difference distributions in the 
same way, and obtain 


Breed T Sama | xiii 

B3(D) e (a de prt) ’ Bs(D) (a E po) : ‘ ; ( ) 
40ab i 

AA airi re os eaeae (xiv) 

BD) = PS) — lB + Sta +828}. OW) 


Thus £,(D) = B,(S) and B,(D) = B,(S) if the unit sample distribution is symmetrical. We 
also see that the values of f, and f, for the raw-score difference distribution will always be less 
than those of the corresponding coefficients of the score-sum distribution unless the unit sample 
distribution is symmetrical, in which case all coefficients of odd order vanish in both cases. 
If the unit sample distribution is skew and leptokurtic the values of f, and f; for the raw-score 
distribution will be smaller than those for that of the score-sum, i.e. the raw-score distribution 
will be less leptokurtic as well as less skew. Our investigation of the moments of the raw-score 
difference distribution thus leads us to the conclusion that it approaches more closely to the 
normal than does the score-sum distribution for the (a + b)-fold sample ; but we have already 
seen that no comparable straightforward statement applies to the difference distribution of the 
proportionate score. 
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Expressions for higher Pearson coefficients of the latter are unwieldy and the derivation is 
laborious. Accordingly, we shall cite only approximate expressions, neglecting terms which are 
trivial. When (a + b) is large : 

SWR ed O ae i ye 


ee 15(K? — K + 1) 10(K — 1)? 
Bald) = 15 + AA eo ee KK +16" ; 


210K(8, — 3)  280(K* + 1) 5 
TEU AA 


Evidently, all the above approach the normal as limiting values when % (and hence also a) 
is large. 


Be(d) = 105 + 


1407 NORMAL APPROXIMATIONS 


How nearly the standardised moments of a discrete distribution approach those of the normal 
is a purely algebraic issue, if the algebraic pattern of the former is specifiable. How close the 
correspondence must be to justify recourse to the normal as a descriptive function is largely an 
empirical one ; but it may be possible to limit the ground for numerical exploration by reference 
to an alternative standard. Many theoretical distributions whose moments are specifiable 
(e.g. the Poisson) closely approach the normal with suitable choice of parameters; and if they 
are amenable to tabulation it is a simple matter to choose one as a yardstick distribution. For 
instance, we may easily derive from the tables of the Poisson function what value of its single 
parameter M ensures a percentage error no greater than e for summation of all values up to the 
2o level. We may then ask what conditions ensure that the standardised moments of the 
binomial variate defined by successive terms of (q + p)" lie closer to the normal than do those of 
the Poisson yardstick distribution. | 

When we speak of a binomial variate so defined in the most general sense, i.e. without 
restriction on the origin or scale, we assume a set of scores of range ato a + rAx so that fi» . > de d 
is the frequency of the score a + xAx. If we are speaking of the raw-score of an r-fold sample 
from an infinite 2-class universe this means that a = 0 and Ax = 1. If we are speaking of the 
proportionate-score deviation of such a sample a= —p and Ax=r"1, If we are speaking of the 
mean score of the 3-fold toss of a tetrahedral die with face pips 2, 4, 4, 6 we have a = 2 and 
Ax = $. Here it suffices to consider the situation which arises when a = 0 and Ax = 1 since 
a does not affect the value of the mean moments and the appropriate power of Ax appears as a 
scalar factor common to both the numerator and the denominator of the Pearson coefficients. 
Since also (q + p)™ defines the sampling distribution of the ra-fold score-sum from an infinite 
two class universe and that of the r-fold score-sum from a universe of (a + 1) classes whose 
frequencies tally with successive terms of (q + p)*, it will suffice to define the moments of any 
binomial universe in terms of those of the universe of 2 classes. 

With that end in view it will be convenient to write g = mp, so that p = (m + 1)~1, and if 
rp = M, r = M(m+ 1). lf q+ p defines the u.s.d. of score values 0, 1 and deviations — PD, 
q = (1 — p), we may write: 

m(m* — 1 
SN pot = pala — p) = MD 

m(m*-1 + 1 
ma = q(— p)” + pg” = plt + pit) = aoe 
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Whence 


m — 1)? m? —m-+1 
qe eee renga 


m m 
= (on Len ad B me —m+m—m-+1 
E A A ae ee eee eg 


> 


Bs 
m? m? 

mê — m3 + m — m? +m? — m +1 
—  Á £0€CO8O DD OO OO o 


m? 


(m — 1)?(m* + m? +1) | 
pete CS 


Bs 


From the above we see that odd Pearson coefficients vanish only if m = 1 (p =} = 4) 
as we know; and we can define the value of m which confers normal kurtosis by putting 


Be 


2_m-+ ] 
A 3e mee so that m? — 4m + 1 =0. 
The roots of the above are approximately 3-73 and 0:27 corresponding to p œ 0:21 and 0-79 
within which range 8, < 3 and the distribution is platykurtic. Outside this range the distri- 
bution is leptokurtic and skew as is true of the Poisson. For the distribution defined by successive 
terms of (q + p)" in terms of m and M = rp we have the following values of the Pearson 
coefficients tending to the Poisson limits and lying consistently between those of the Poisson 
distribution and those of the normal outside the range m > 3-73: 
(m — 1)? 


do ae 
ae Ee iit ; 


e oe 
1 om — 1 1 
a Toom eae 
rPo err Mame 3+ aq in the limit ; 
(m — 1)(m? — 10m + 1) , 10(m— 1)? 1 iy =. “aa 
A E y he limit ; 
"Po Mmm + I)? O 
Be al +1) (m—m+1)  24(m* —m ae ee 
are Mm(m +1)  Mim(m+1)?  Mimm+1)? Mim +1) 
39 25 oe a 
o n +a + y in the limit. 


Expressions for Pearson coefficients of higher orders may be obtained in a similar form, 
but are very unwieldy. We here cite the Poisson limiting forms: 


ae of Se L rate dita 
ee ee OM 


The above relations presuppose m > 1, so that q > p as is true of the Poisson distribution. 
Now the histogram of the distribution whose definitive binomial is (g + py” is the mirror image 
of that of the distribution whose definitive binomial is (p + q)". - For every value m = k in the 
rangé m > 3-73, there will thus be in the range m < 0:27 a value m = k-! definitive of a distri- 
bution with identical Pearson coefficients. The above relations thus hold good for the range 
m < 0-27 if we reverse the score order, i.e. we put M = rq when q <p. For a given value 
of M=rp when p <q (or M=rq when q <p), we may therefore say that the variate 
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whose definitive binomial is (q + p) has mean moments nearer to their normal values 
than those of the Poisson distribution with the same parameter M if p lies outside the 
range stated. Subject to this restriction we may therefore say that the normal is a satisfactory 
fitting curve for a binomial variate if M exceeds the value for which the Poisson distribution 
tallies as closely with the normal as a satisfactory fit implies in the context. Table 4 shows how 
closely the Poisson distribution does in fact correspond to that of the normal at prescribed sig- 
nificance levels for three different values of M. Tables 5 and 6 explore the intermediate zone 
of p values consistent with platykurtosis of a binomial variate. Table 7 shows values of the 
Pearson coefficients for binomial variates with assigned values of M. In Tables 4-6, as in other 
tables of this chapter, the column headed exact under frequency refers to that of the discrete 
distribution, and the column marked normal next to it cites the corresponding ordinate of the 
normal curve. Under cumulative frequency, the column headed normal refers to areas bounded 
by the corresponding ordinate on either side of the mean after making the appropriate half interval 
correction. 


TABLE 4 


Comparison of Cumulative Frequencies of Normal and Poisson (M = a?) Distributions 


Raw-Score 
Deviation xX x 
X pa Poisson | Normal == Poisson | Normal | Poisson | Normal 
os = | SS AOR 

ne a ee O ee ol —3-0984 | 0-0002 | 0:0015 
ES E EA PRE O A A amen —2:8402 | 0-0008 | 0-0034 
E A EI O E ER EOS ee —2:5820 | 0-0027 0-0068 
mre GT E IA S —2:8460 | 0:0005 0:0036 |—2-3238 ¡ 0-0075 0:0141 
saa Seat E A S ee A” O —2-5298 | 0:0028 | 0:0089 |—2-0656 | 0-0179 0:0264 
i eee O SS —2-2136 | 0:0104 0:0199 |—1:8074 | 0-0373 0:0467 
— 6 —2:4495 | 0:0025 0:0124 |—1-8974 | 0:0293 0:0410 |—1-5492 | 0-0697 0:0778 
— 5 —2:0412 | 0-0174 0:0310 |—1-5811 | 0:0671 0:0774 |—1:-2910 | 0-1183 0-1227 
— 4 —1-6630 | 0-0620 0:0765 |—1-2649 | 0:1302 0:1342 |—1:0328 | 0-1846 0:1831 
— 3 —1-2247 | 0112 0:1537 |—0-9487 | 0:2203 0:2146 |—0-7746 | 0-2675 0-2593 
— 2 —0°8167 |. 0-2851 0:2702 |—0-6325 | 0-3329 0:3176 |—0:5164 | 0-3631 0-3493 
— 1 —0-4082 | 0-4457 0:4191 |—0-3162 | 0:4580 0:4382 |—0:2582 | 0-4655 0:4486 
0 0:0000 | 0:6063 0-5809 0:0000 | 0-5831 0-5618 0:0000 | 0-5679 ¡ 0-5514 
1 0:4082 | 0-7440 0:7298 0:3162 | 0-6968 | 0-6824 0:2582 | 0-6639 0:6507 
2 0:8165 | 0-8473 0:8463 0:6325 | 0:7916 0:7854 0:5164 | 0-7486 0:7487 
3 1-2247 | 0:9161 0:9235 0-9487 | 0-8645 0-8658 0-7746 | 0:8192 0:8169 
4 1-6630 | 0:9574 0-9690 1-2649 | 0:9166 0-9226 1-0328 | 0-8749 0:8773 
5 2:0412 | 0:9799 | 0-9876 1-5811 | 0-9513 0:9590 1-2910 | 0-9167 0-9222 
6 2:4495 | 0-9912 0-9960 1-8974 | 0:9730 0:9801 1:5492 | 0:9466 0:9533 
fi 2:8577 | 0:9964 0:9989 2:2136 | 0-9858 0-9911 1-8074 | 0-9670 0:9736 
8 3:2660 | 0-9986 | 0-9997 2-5298 | 0-9929 0:9964 2:0656 | 0:9803 | 0-9859 
9 3-6743 | 0-9995 0-9999 2:8460 | 0-9966 | 0-9987 2-3238 | 0-9886 0:9932 
10 4:0824 | 0-9998 | .:.... 3-1623 | 0-9985 0-9996 2-0820 | 0:9936 0-9966 
11 44906 | 09999 T uz. 3:4785 | 0-9994 0-9998 2:8402 | 0-9965 0-9985 
AS eC ag E E oe ae 3:7947 | 0-9998 | 0-9999 3-0984 | 0-9981 0-9994 
Re ee oe Oe ee 4-T109 1. AA 3:3566 | 0:9990 0-9997 
eee AA A ee ee eee E 3:6168 | 0-9994 0-9999 
A ee ee ree AA A PBI -T POIG I a aie 

eee AP AO PP EE o 4:1312 | 0-9997 

| 


M = 6; f, =0-16; $,=3:16. |M = 10 ; $, = 0-10 ; b, = 3-10. M=15 ; B,=0-06; B,=3-06. 


Dd 0 E A @ © 
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TABLE 5 


(4+ 4; M=10; o = 22361; B, = 0; fa = 2-90 


Raw-Score x Frequency Cumulative 
Deviation pes Frequency 
pa a 
Binomial Normal Binomial Normal 
0 0 0-1762 0:1784 0:1762 0:1770 
+1 + 0:4472 0-1602 0:1614 0-4966 0-4977 
+2 + 0:8944 0-1201 0-1194 0-7368 0:7344 
+ 3 + 1:3416 0-0739 0-0725 0-8846 0-8824 
+4 + 1:7888 0-0370 0:0361 0-9586 0:9558 
+5 + 2:2360 0-0148 0:0151 0-9882 0-9861 
+6 + 2:6832 0-0046 0-0049 0-9974 0:9963 
+7 + 3:1504 0-0011 0:0013 0:9996 0:9992 
+8 


+ 3:5776 0-0002 0:0003 1-0000 0-9999 


TABLE 6 


(3 +4); M = 10; o = 27386 ; B, = 005; B, = 2-983. 
ee ee ee as 


Raw-Score xX Frequency. Cumulative 
Deviation => Frequency 
Ed O 

Binomial Normal Binomial Normal 

— 10 — 3-6510 0-0000 0:0002 0-0000 0-0003 

— 9 — 3-2864 0-0001 0:0007 0-0001 0-0010 

— 8 — 2:9212 0-0009 0:0021 0.0010 0:0021 

— 7 — 2:5561 0-0037 0-0056 0-0047 0-0089 

— 6 — 2-1909 0:0113 0:0133 0-0160 0:0223 

— 5 — 1:8258 0-0273 0:0277 0:0433 0-0502 

— 4 — 1:4606 0-0530 0:0499 0:0963 0-1006 
— 3 — 1:0955 0-0857 0-0799 0:1820 0:1806 
=? — 0:7303 0:1179 0-1116 0:2999 0:2919 

— 1 — 0:3651 0:1398 0:1363 0:4397 0:4275 

0 0:0000 0-1444 0-1457 0-5841 0-5725 

1 0:3657 0:1313 0:1363 0:7154 0-7081 

2 0:7303 0:1057 0:1116 0:8211 0:8194 

3 1:0955 0-0759 0:0799 0:8970 0-8994 

| 4 1-4606 0-0488 0-0499 0:9458 0-9498 
5 1:8258 0-0282 0:0277 0:9740 0-9777 

6 2-1909 0:0146 0:0133 0:9886 0-9911 

7 2:5561 0-0069 0-0056 0-9955 0:9969 

8 2:9212 0-0029 0-0021 0:9984 0:9990 

9 3:2864 0-0011 0:0007 0.9995 0:9997 

10 3:6510 0-0004 0:0002 0-9999 0:9999 
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TABLE 7 


Higher Pearson Coefficients w.r.t. distributions defined by (3 + 3)", ($ + $)", ($ + 4)" and (34% + 345)". 


Bi 

e: 

; 

Ba 1261 | 13:20 | 1520 | 2589 | 13-54 | 13:90 14:99 | 2216 | 14:02 1426 | 1489 | 20-40 
Bs 0 3-68 | 10-21 57-52 0 2:37 6:47 | 35-64 0 1-63 4:40 | 23-99 
Be 73-93 | 90:56 | 11913 | 31037 | 8545 | 9681 | 11473 | 253-27 | 91-65 99:20 | 111-90 | 204-96 


If we take the Poisson distribution for M = 10 as our yardstick of satisfactory fit, we may 
thus say that the following range of values for the Pearson coefficients of a leptokurtic binomial 
variate are consistent with a good fit in the same sense : 


Bs E 0-1 5 Bo a 3:1 , Bs 1-1 z 


For symmetrical leptokurtic distributions, the tabulated t-variate (Type VII) dealt with in 15.04 
below, provides us with a standard of comparison, and we may generate symmetrical platykurtic 
yardstick distributions by sampling from the rectangular universe. Table 8 exhibits the close 
correspondence between the normal distribution and that of the total score of the 6-fold toss 
of a tetrahedral die with face-scores 1, 2, 3, 4. For this distribution 8B, = 2-773. 

The sample distribution of Table 8 is referable to score totals increasing by unit steps from 
6 to 24, being therefore a distribution with 19 score classes like the symmetrical binomial variate 
defined by successive terms of (4 + $)18, For the latter distribution M = 9 and Bf, = 2-89; 
and we may expect a very satisfactory normal fit for a symmetrical platykurtic distribution of 
20 or more score classes if f, >28. Needless to say, fitting a continuous curve to a discrete 
distribution is an unduly hopeful undertaking if the number of score classes is fewer. 

With due regard to the caveat last stated, the rapidity with which the mean score of thesample 
approaches normality is remarkable when the distribution is not very skew. Table 9 is instructive 
from this viewpoint. It describes the distribution of 14-fold samples from a U-shaped burette 
(3-class) universe, i.e. a sample distribution of 43 score classes. The normal estimate of 
odds against a value numerically as great as or greater than 2-lo is 39:1 as against 38: 1 
assigned by the exact distribution. For comparison Tables 10 and 11 give respectively the 
10-fold sample mean-score distributions from a skew and slightly platykurtic 3-class universe 
and the 8-fold sample mean-score distribution from a skew and slightly leptokurtic 3-class 
universe. 

In 14.06 we have seen that the raw score and proportionate score difference distributions 
of equal (a-fold) samples from an infinite 2-class universe are identical and that their moments 
must lie nearer to the normal than do those of the distribution of the 2a-fold sample score sum 
or mean score except when m = 1(p = 4 = q), in which case the three distributions are identical. 
The accompanying Table 12 exhibits Ta close is the normal fit for more or less skew binomial 
difference distributions, each being referable to the difference between the scores of samples 


of 10 from a 2-class universe for p = 0-5, 0-25, 0-20 and 0-10. 
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TABLE 8 


Comparison between the Normal Integral and a 6-fold toss of a Tetrahedral Die. 
A: A = 9.793 


ARA SE Frequency Cumulative 
Deviation am Eregusoey 
X Oz ia Se pees 
Exact Normal Exact Normal 
0 0-0000 0:1416 0:1457 0:1416 0:1449 
l 0:3651 0-1333 0:1363 0-4082 0-4161 
g 0:7302 0-1113 0-1116 0-6308 0-6386 
3 1-0953 0-0820 0:0799 0:7948 0-7987 
4 1-4604 0-0527 0:0501 0:9002 0:8996 
5 1-8255 0-0293 0-0275 0-9588 0-9555 
6 2-1906 0-0137 0:0137 0:9862 0-9824 
F 2:5557 0-0051 0:0056 0:9964 0:9938 
8 2-9208 0-0015 0-0020 0-9994 0-9981 
9 3-2859 0-0003 0:0007 1-0000 0-9995 


TABLE 9 


Comparison between the Normal Integral and the burette sampling distribution specified by (Ẹ + + %)"4. 


o = 3:347 ¡Ba == 2875 
S Frequenc Cumulative 
core-Sum X q ed F 
Deviation ma band 
A Exact Normal Exact Normal 
0 0:0000 0-1173 0-1192 0:1173 0:1180 
1 0:2988 0-1124 0-1140 0:3421 0-3460 
2 0:5975 0-0992 0-0997 0:5405 0-5449 
3 0-8963 0-0801 0-0798 0-7007 0-7043 
4 1-1951 0-0589 0-0584 0:8185 0:8212 
5 1-4939 0:0410 0:0391 0:9005 0-8996 
6 1-7926 0:0247 0-0239 0-9499 0:9478 
7 2-0914 0:0137 0-0134 0:9773 0:9750 
8 2-3902 0-0068 0-0068 0:9909 0-9889 
9 2:6889 0-0030 0-0032 0:9969 0:9954 
10 2-9877 0-0011 0-0014 0:9991 0-9983 
11 3-2865 0-0004 0-0005 0-9999 0-9994 
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Comparison between the Normal Integral and the burette sampling distribution specified by (4 + 2 + 2y1, 
¿Pa = 2:904 


Score-Sum 
Deviation 


31 
3 
28 
3 
25 
3 
22 
3 
19 
3 
16 
3 
13 
3 

O, 


+ 0:2828 
+ 0-7071 
+ 1:1314 
+ 1-5556 
+ 1:9799 
+ 2-4042 
+ 2:8284 


1 
3 
Zz 
3 
4 
3 
1 
3 
2 
3 
5 
3 
8 


ole 


[Ce od 
o “ly 


HEEE +++ 
Ja co 
e Pla 


“| 


TABLE 10 


o = 2:357 


Frequency 


0-00002 
0-00013 
0:-00057 
0-00203 
0-00601 
0-01514 
0-03275 
0-06106 
0-09826 
0-13628 
0-16214 
0-16413 
0-13947 
0-09748 
0-05425 
0-02279 
0-00651 
0-00098 


Normal 


0-00001 
0-00007 
0:-00033 
0-00134 
0-00458 
0-01309 
0-03123 
0-06227 
0-10368 
0-14422 
0-16757 
0:16262 
0 13442 
0-08924 
0:-05047 
0:02384 
0:-00941 
0-00310 


Cumulative 
Frequency 


0-00002 
0-00015 
0-00072 
0-00275 
0-00876 
0-02390 
0-05665 
0-11771 
0-21597 
0-35225 
0:51439 
0:67852 
0-81799 
0-91547 
0-96972 
0-99251 
0-99902 
1-00000 


0:00001 
0-00009 
0-00044 
0-00187 
0:00667 
0-02016 
0-05195 
0-11466 
0-21834 
0-36182 
0-52820 
0:68969 
0-82102 
0:91044 
0-96144 
0-98581 
0-99555 
0-99882 


TABLE Ii 
Comparison between the Normal Integral and the burette sampling distribution specified by (#5 + To + $5) $. 
@ == 1-523 ¿Ba = 3:04 
acta x ae Cumulative 
ema T — 

x Exact Normal Exact Normal 
— 34 — 4-4649 0-00001 0-00001 0-00001 0-00002 
— 22 — 3-8083 0-00020 0-00019 0:00021 0-00025 
— 24 — 3:1517 0:00185 0-00183 0:00206 0:00238 
— 12 — 2:4951 0:01138 0:01165 0:01344 0:01513 
— 14 — 1-8385 0:04721 0-04834 0-06065 0-06551 
—2 — 1:1819 0-13019 0-13003 0-19084 0-19667 
— $ — 0:5253 0-23196 0-22817 0-42280 0-42192 
+4 + 0:1313 0-26039 0-25969 0-68319 0-67712 
+£ + 0:7879 0-18886 0-19204 0-87205 0-86783 
+ 41 + 1:4445 0:09104 0-09228 0-96309 0-96186 
+ 48 + 2:1011 0-02954 0-02881 0-99263 0-99243 
+ 21 + 2:7577 0:00640 0-00585 0-99903 0-99898 
+ 28 + 3:4143 0:00089 0-00077 0-99992 0:99991 
+ 31 + 4:0709 0-00007 0-00007 0-99999 0-99999 
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TABLE 12 


Distributions of Raw-Score or Proportionate (Mean) Score Difference for equal 10-fold samples from more or 
less skew 2-class universes, the assumption being that (a) the universe is infinite, or (b) the universe is finite and 
sampling 1s subject to the replacement condition. 


p=t=q 
IA D no ori 
Difference Ta vie coed 
= Exact Normal Exact Normal 
0 0-0000 0:1762 0:1784 0-1762 0:1770 
1 0-4472 0-1602 0:1614 0-4966 0:4977 
2 0-8944 0-1201 0-1194 0-7368 0-7344 
3 1-3416 0-0739 0:0725 0-8846 0:8824 
4 1-7889 0:0370 0-0361 0-9586 0-9558 
5 2:2361 0:0148 0-0151 0-9882 0-9861 
6 2:6833 0-0046 0-0049 0-9974 0:9963 
7 3:1305 0-0011 0-0013 0-9996 0:9992 
8 Salis 0-0002 0-0003 1-0000 0:9999 
b=i34=1 
0 0:0000 0-2056 0:2060 0-2056 0-2038 
1 0:5164 0-1800 0:1801 0:5658 0:5614 
2 1-0328 0-1212 0:1208 0-8080 0-8033 
3 1-5492 0-0625 0:0623 0:9330 0-9293 
4 2:0656 0-0247 0-0246 0:9824 0:9799 
5 2:5820 0-0073 0:0074 0:9970 0-9955 
6 3-0984 0-0016 0-0017 1-0000 0-9992 
7 3:6148 0-0003 0-0003 1-0000 0-9999 
=$;4=5 
0 0:0000 0-2238 0:2230 | 0:2238 0-2202 
1 0:5590 0:1909 0:1907 0-6056 0-5982 
2 1-1180 0-1190 0:1194 0-8436 0:8377 
3 1:6770 0-0544 0:0546 0:9524 0:9496 
4 2:2360 0-0184 0-0183 0-9892 0-9881 
5 2-7950 0-0046 0-0045 0-9984 0-9979 
6 3-3540 0-0008 0-0008 1-0000 0:9997 
7 3:9130 0-0001 0-0001 1-0000 
p $ ES Yo 

0 0:0000 0-3126 0:2974 0:3126 0:2907 
1 0:7454 0-2219 0:2252 0-7564 0:7395 
2 1-4907 0:0920 0:0979 0:9404 0:9376 
3 2:2361 0:0246 0:0244 0:9896 0:9909 
4 2-9814 0-0045 0-0034 0-9986 0-9992 
5 3-7268 0-0006 0-0004 0-9998 1-0000 
6 4:4721 0-0001 0:0000 1-0000 


Tables 13 and 14 exhibit score difference distributions for samples of equal size from 
burette (3-class) universes. 
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TABLE 13 


Comparison between the Normal Integral and the Mean-Score Difference Distribution of 6-fold samples 
drawn from a Burette Universe in which pg = 0'3, h = 2. 


o = 0°5196 B.(d) = 2:877. 

M S | C lati 
Oliwa a Fecauaney Freeney 
Deviation a haces 

d Exact Normal Exact Normal 
0 0:0000 0:13393 0-12796 0-13393 0:12746 
+ 0-3208 0-11254 0-12154 0-35901 0:36959 
2 0-6415 0-11045 0-10415 0-57991 0:57738 
3 0-9623 0:07570 0:08053 0-73131 073840 
4 1-2830 0-06146 0:05624 0:85423 0:85107 
& 1-6040 0:03360 0:03537 0:92143 0:92228 
g 1:9246 0-02250 0:02008 0-96643 0:96293 
z 2:2453 0:00939 0:-01029 0-98521 C:98589 
5 2:5661 0:00513 0:00475 0-99546 0:99358 
2 2-8868 0:00149 0:00198 0-99846 0:99768 
10 3:2076 0-00065 0:00075 0-99976 0:99923 
TA 3:5288 0-00010 0:00025 0-99996 0:99977 
12 3-8492 0-00003 0-00008 1-00000 0:99994 
TABLE 14 


Comparison between the Normal Integral and the Mean-Score Difference Distribution of 4-fold samples 
drawn from a Burette Universe in which pa = 0:1, h = 2. 


o = 0:3808 Ba(d) = 3:038 
Mean-Score J Fre Cumulative 
Difference — ia Frequency 
Deviation g 
d Exact Normal Exact Normal 
0 0-0000 0:26427 0:26191 0:26427 0:25732 
i 0:6565 0-21105 0-21111 0:68637 0:67526 
2 1-3130 0-10933 0-11071 0:90503 0-89928 
2 1-9695 0:03749 0:03766 0-98001 0:97844 
4 2:6260 0:00857 0-00833 0-99715 0:99686 
Š 3-2825 0-00129 0-00120 0:99973 0-99969 
£ 3-9391 0-00012 0-00011 0:99997 0:99998 
z 4:5956 0-00001 0-00001 0:99999 0:99999 


It is very important to recognise the implications of the half interval correction when assessing 
the odds in favour of or against a score value equal to or greater than a standard score of unit 
variance definitive of a sample from a discrete universe. The variance of the distribution whose 
definitive binomial is (4 + 4)" is rpg = 4, so that o = 2. Thus the standard score corresponding 
to a raw score deviation of + 4 (x = 12) is 2. With due regard to the half interval, the corre- 
sponding entry in the normal table is (4 + 4) — 2 = 2-25. The vector probability that a normal 
score of unit variance will not exceed + 2:25 is about 0-9875 or odds of over 70: 1 against. 
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The vector probability that a normal score of unit variance will not exceed + 2-0 is 0:9773 or 
odds of 43:1 against. 


The exact probability of a score as great as 46 in the 10-fold toss of an ordinary cubical 
die is 0-9843 (to 4 significant figures). The s.d. of the 10-fold toss distribution is 5-401 about 
a mean of 35, so that the corresponding standard score is (46 — 35) + 5-401 œ 2-037. With 
the half interval correction the corresponding entry in the table of the normal integral will be 
ee 
2(5:401) 
For this value the normal integral gives approximately 0-9834 an error of about 1 in 1000. This 
makes the odds (vector) against the occurrence : 


2:037 + a 213, 


Exact . ; ; : A i ; ; pie ks ee 
Normal approximation with half interval correction . S 
Normal approximation without half interval correction . i El 


The reader may find it worth while to calculate the corresponding modular odds. 


14.08 SAMPLING FROM DIFFERENT UNIVERSES 


By recourse to the product rule we can infer the distribution of the score-sum or mean score of 
p independent samples each taken from a different, as well as of p such samples taken from the 
same, universe. If G,(a), G,(b), etc. are the generating functions of the u.s.d. of universe A, 
universe B, etc. respectively, and G,(s) is that of the p-fold score-sum, 


fe = Ga). G8). Gle:.. .- ete... 00 
If all the universes from which we take each unit sample are identical, as is necessarily so if we 


take them (with replacement) from one and the same universe, we may write G,(a) = Gu = Gy(b), 
etc. and 


ee ee es Le 


The following example illustrates the meaning of (i) : 


Universe A Universe B 
Score 3 4 5 6 —i 3 11 17 23 
e A PO ee os eet: SY SENS VOET Soe 
Frequency io 1 y 16 I6 3 


To exhibit the product rule we may lay out our grid for the score-sum of unit samples from 
each universe as follows : 


Cell Scores (X 3) Cell Frequencies (Xx 16?) 
3 4 5 6 


pa fa 
| o en ej 


Aly 


bo 
soleo 
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The score-sum distribution is thus: 


Score a, > eee Ge A O ey ee 
Femy gir Sele ae See OS ta Bee Se OR eS A 
For the generating functions we may write 
G,(a) = 4(e* + eh + eh + e”); 
h 5h 11% 17h 28h 
G,(b) = Pyle © + 4e? 6e? + 4e? +e’). 
By direct multiplication we obtain 
| Sh 11% 14h 17h 20h 23h 
G,(a). G,{b) = gee? +e® +58 + 5e*% + 10e% + 1083 ... etc.) 


Each coefficient in the above corresponds to one of the frequency terms of the sum distribution, 
and the co-fa: tor of h in the exponent of e is the corresponding score-sum itself on the assumption 
that we take 2 unit samples one from each universe. It is very important to recognise that this 
is not the same thing as taking a 2-fold sample from the equivalent homogeneous (Bernoullian) 
universe. Since our sample prescription is that we take equal (unit) samples from each stratum 
we can conceptualise a corresponding destratified universe on the assumptions : (a) that the two 
strata each contain the same number (16) of items and the common pool contains 32 in all; 
(b) that we replace each item before drawing another. The u.s.d. of this pooled universe is then 


Score. < -4 $ 3 11 4 5 12 6 23 
1 E. a -6 _4_ 4_ _4_ E zz e 
Frequency E E 2 32 32 3 3 3 


For such a Bernoullian universe we may write 
_h 5h oh EEN = cl 17% SS 
e235 31 4e? + 4e? + Gel® + 4e3 + 4e¢% +... 4e% + 4e? IF 
The generating function of the 2-fold sample from the same universe is GZ, from which we 
derive by direct multiplication the following score-sum distribution as the reader may check 
by recourse to the grid : 


Score Frequency Score Frequency 

(x 3) (x 1024) (x 3) (x 1024) 
— 2 1 26 80 
4 8 a 64 
8 8 28 56 
10 28 29 80 
11 8 30 48 
14 40 32 40 
16 56 33 32 
17 40 34 28 
18 16 35 40 
20 80 36 16 
21 3z 38 8 
22 70 40 8 
23 80 41 8 
24 48 46 1 


The foregoing example shows that : (a) the score-sum distribution w.r.t. unit samples from 
each of p strata of a stratified universe is very different from that of p successive samples from 
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the equivalent unstratified universe ; (b) the specification of the equivalent Bernoullian universe 
admits of no simple recipe in so far as difference of origin as well as scale w.r.t. the unit sample 
distribution of the strata determine its character. It is, however, possible to probe the issue 
further, if we make the assumption that each stratum u.s.d. has zero mean. 

To bring this assumption to life, we may suppose that each universe of the foregoing example 
is a lottery wheel like that of Fig. 97 assigned to one of two players and then consider the result 
of recording the score of a 2-fold spin of each. At each trial we thus record two scores, 4,, 4, 
w.r.t. lottery wheel A and b,, b, w.r.t. lottery wheel B. Instead of allocating to the player the 
score-sum, we may record the result as the paired difference scores of each of them, vz. : 
d, = a — a, and d, =b,—b,. In 14.02 we have seen that the distribution of both sets 
of paired d-scores must be symmetrical about zero mean ; and we may write their generating 
function as follows : | 


Gia.) = Pet + gh y ooh + eeh) (q 3h e eh, 
5h 11% 17h 23h 23% 


h 5h 11% 17h 23h h 5h 11h _17h 
G(d, ee o? da? ree n r +60 * ile * Je > 


h 5h _ 11h 17h i 


The above reduce to a more convenient form for purposes of differentiation : 


tie gs oe i? 
G(d,) = 2 le 2 +e ez, i (iii) 
Pe a” A O 


Thus our new distributions are as below : 


Frequencies 
Scores 

d», da Total 

— 8 1 0 1 
— 6 8 0 8 
— 4 28 0 28 
y 0 16 16 
Y 56 32 88 
— 1 0 48 48 
0 70 64 134 

j! 0 48 48 

2 56 32 88 

3 0 16 16 

4 28 0 28 

6 8 0 8 

8 1 0 1 
Total 256 256 512 


The results of recording pair difference scores from the two lottery wheels with border- 
scores as specified at the beginning of this section would, of course, be the same as those of 
recording the single score distribution of lottery wheels with score distributions respectively 
identical with those of d, and d, above. We then have again a universe of 2 strata but this time 
the means of the two distributions are identical, both being zero. On the assumption that we 
take samples of equal size from each stratum, the equivalent homogeneous universe is as specified 
by equal weighting in the column marked “Total”. In this composite distribution each frequency 
term is the sum of the corresponding terms of the stratum distributions divisible by twice the 
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common total, i.e. each frequency is the mean of the corresponding frequencies of the two d-score 
distributions. ‘Thus we can express the m.g.f. of the composite homogeneous universe by the 
relation | 


G(c) = 31G(d,) + G(d,)]. 


We can extend this result by iteration to any number of strata on the same assumption, viz. that 
the stratum distributions all have zero mean. For p strata we have 


17a? 


Ge=5 2 Olds) eee ee 


c=] 


It follows from (v) that the moments of the u.s.d. of the Bernoullian universe will be the mean 
of the corresponding stratum moments. If we take one sample from each stratum the m.g.f. 
of the distribution of the p-fold score-sum is as given by (i), viz. : 


e 2 es 


== 


If we take p independent unit samples from the composite homogeneous universe, the m.g.f. 
of the score-sum distribution is 


122p p 
G? = E > G(d.)) , . (vit) 
Past 

If the stratum distributions are normal, (vi) becomes 


t= 


7 p 
G(s) = [A 


z=1 


If V = (Va + Va + V,...) this is equivalent to 
oe: ; : : ; í . (viii) 


The last expression defines the m.g.f. of a normal distribution of variance V. Subject to the 
same assumption (vii) becomes 


1 A e 3 
G@ =|- > aro) i i : ; ; Eana) 
Pe=1 
The last expression evidently cannot reduce to the same form as (viii) unless V, =V,=V,... 

etc. in which case | 


z=p 
ee ee 
Pent 
This is the m.g.f. of a normal universe of variance V, = V,, etc. and that of the p-fold score-sum 
distribution is 
i i A 


We thus see that the distribution of the sum of p paired difference scores from different 
normal universes is necessarily normal, but it is equivalent to the distribution of the p-fold 
sample score-sum from the homogeneous pooled universe only if one assumption holds good. 
If the sub-universes (strata) from which we extract our score-pairs differ w.r.t. the value of the 
mean only, the distribution of the paired difference having zero mean will be identical from 
stratum to stratum, and taking p samples from any one of them amounts to the same as taking 


APV ¿EP 


one sample from each of the p strata. If so, we can look on each d-score as a unit sample from 
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one and the same normal universe, and since the d-score has zero mean, an unbiased estimate 
of its variance (V) based on p samples is 


Be O 


For the variance of the mean score of a p-fold sample from such a universe we have as our 
unbiased estimate 


2 =P 
ee eas 
Ê P y a=1 
The mean d-score itself is 
ENS? 
M =- de. 
P a= 


ee Fe ay 


The justification for the derivation of (xiv) is that the d-scores constitute a homogeneous normal 
universe whose u.s.d. variance is definable in the usual way. 

We shall now assume that the variances of the d-score distributions are not identical, in 
which case their p-fold score-sum distribution is still normal in accordance with (viii), the 
variance being 


V-_V EV IV, +... : : o) 


If we denote the score-sum by s, we have s = pM, and derive the variance (V m) of the mean score 
deviation by the usual scalar transformation : 


E 5 i ; ; i ; . (xvi) 


To evaluate V in terms of our observations, we note that we have one sample score d, from which 
to estimate V,, one sample score d, from which to estimate V, and so on. If we did not know 
that the true mean of each of the d-score distributions is zero, we could not do this; but our 
assumption is that we do have this knowledge. To appreciate the implications of this let us — 
make explicit the distinction between the true mean (M,,) and the sample mean (Ms) of an r-fold 
sample of scores (x), employing the symbol £,(. . .) to signify the operation of extracting the mean 
of the complete sampling distribution. We may then write | 


r 


È (x — Mu)? 


A” S a 
Y 
x, — My? ae 
Pais), BO = an : ; . (xviii) 
* More fully, if E(. . .) is the operation of extracting the sample mean and E; . E, = E = F, . E,: 


V, = E(x — M,)? = E(x*) if M, = 0; 


of Bix MI = Ela) if M, = 0. 
Whence if M, = 0 
Es(07) = E,. Elx?) = E(x’) = Vu. 


630 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


In (xviii) M, = x if we have only one sample value of x and (x, — M.) =0; but if so r = 1 and 
(r — 1) =0, whence the value of sê is indeterminate. If we know that M, = 0 in (xvii), oĉ is an 
unbiased estimate of V,. We may thus write for our distribution of unit sample d-scores from 
different universes 


El Ya EY, ete 
DA o a, td O E A 
In accordance with (xvi) our unbiased estimate of V,, is therefore 
2. d 
ce 
As before, we may write the square of the mean d-score as 
me = (2. de 
p 


Whence we derive the empirical square critical ratio of the mean d-score, i.e. square standard 
mean d-score, in the same form as (xiv), viz. : 
M? (2 d.) | 
A ree ; : ; (xix) 
Vin > g 
Thus the rationale of the approximate (large sample) c-test for paired differences does not 
necessitate the assumption that each paired difference is a unit sample from a universe with the same 
variance as each other such universe. 


EXERCISE 14.08 


1. Four players each toss one of 4 tetrahedral dice with face scores respectively as follows : 
LSR oe ee 2d 2,58 4-3 00 
Cite the mean and the variance of 
(a) the u.s.d. of each player’s score ; 
(b) the distribution of the score-sum of each player’s double toss ; 


(c) the distribution of the score difference of each player’s double toss. 


2. Write down the m.g.f. of each of the distributions (a)-(c) above, and specify the frequencies 
of possible score values. 


3. Repeat Exercises 1 and 2 above for the case when the dice have face scores as follows : 
1,2343: IESO 335,32) 2 


4. Compare the results of Exercises 1-3 above, and draw your own conclusions w.r.t. what features 
of the u.s.d. are relevant to the character of the 2-fold toss difference distribution. 


14.09 THE USE oF PAIRED DIFFERENCES 


The considerations advanced in 14.08 have a special bearing on the merits and disadvantages 
of pairing observations in one or other way mentioned below. In general, pairing in virtue of 
a similarity relevant to the end in view is always a wise procedure, if the outcome of the experiment 
is so clear-cut that need for statistical analysis does not arise. Otherwise, it is important to 
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realise that pairing presupposes sampling in a stratified universe and sampling in a stratified 
universe may be intractable from the statistical standpoint. 

When we pair data, we conceive the universe as potentially stratified w.r.t. 2 criteria of 
classification, which we may specify as: (i) treatment (columns); (ii) within-pair resemblance 
(rows). The following possibilities then arise : 


Case 1. 


The universe is homogeneous in both dimensions ; 


Case 2. 
the universe is homogeneous in the row dimensions alone, i.e. from pair to pair for 
one and the same treatment ; 


Case 3. 


the universe is homogeneous in the column dimension alone, i.e. for different treatments 
on members of the same pair ; 


Case 4. 
the universe is homogeneous in neither dimension, in which event we have no guarantee 
that the composite sample will provide more than a single sub-sample for any relevant 
parameter. 


Cases 1 and 2 are trivial in this context, since we accomplish (and lose) nothing by pairing 
when corresponding (within column) members of different pairs are unit samples from identical 
sub-universes. Within the framework of appropriate assumptions, Case 3 and Case 4 may each 
be reducible to Case 1 if we employ the method of scoring by paired differences; but there 
appears to be prevalent some misconception about what the appropriate assumptions are. As 
regards Case 3, the relevant issue comes into focus when we examine a distinction between the 
following models. Each type consists of a series of different dice, each of which a player tosses 
twice. We thus have a pair of score values for each die and the stratified universe of which 
our paired scores are stratum-samples is homogeneous in one dimension in virtue of the fact 
that each member of a pair is a unit sample from the same stratum : 


Model I. Four players each toss twice one of four tetrahedral dice with face-scores 
respectively as follows : 


2029-9. 3 3 4°38 a S 5-45, 5:6 


The variances of the single toss distributions are the same for each die; but the mean scores 
are different, viz. 2, 3, 4 and 5 respectively. For the 2-fold toss the variance of each player’s 
score difference distribution is the same, being unity and the mean is zero in each case. 


Model II. Four players each toss twice one of 4 tetrahedral dice with face-scores respec- 


tively as follows: 
MD 4 Bee E A A a 


The means of the single toss distributions (2, 4:5, 5, 4) are different, as are also the variances 
(0-5, 1:25, 2, 4-5). For the 2-fold toss, the mean of each player’s difference score is the same, 
being zero; but the variances are different, being respectively 1, 2-5, 4, 9. 

In both series, the mean scores for the 2-fold toss are sample scores from strata with different 
definitive parameters ; but this is not true of the score differences as we see from the following 
lay-out in which M is the expected mean score of the stratum and o;, the variance of its distri- 
bution ; dm is the expected score difference and oz the variance of its distribution. 
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Model I Model II 
M 02, de “se M o2 dn 0? 
2 0-250 0 1 o 0-250 0 1 
3 0:250 0 1 4-5 0:625 0 2-5 
A 0-250 0 1 5 1-0 0 4 
5 0-250 0 1 4 225 0 9 


Though we have here the variance only, it goes without saying that all the mean moments 
of the stratum d-score distributions are identical for Model 1. Thus the stratum difference 
score distributions are in fact identical. Since sampling from n identical strata is equivalent to 
sampling with replacement from any one of them or sampling from any one of them without 
replacement on the assumption that each stratum is indefinitely large, the Model I universe of 
d-scores is indeed a homogeneous universe. 

In what circumstances we can consider Case 4 as a homogeneous universe will now be 
evident. Our new model will be 


Model III 
Player A tosses once a tetrahedral Player B tosses once a tetrahedral 
die with face scores die with face scores 
1 2 2 3 3 + a 5 
3 + 4 5 9 6 6 7 
8 9 9 10 10 11 11 12 
16 17 17 18 18 19 19 20 
For this model we have 
M oF, Ary 0% 
3 0-25 2 1 
5 0:25 2 1 
10 0:25 2 1 
18 0-25 2 1 


The important common property of Models I and III, viz.: that the universe of d-scores 
is homogeneous, arises from the fact that the row-stratum distributions differ with respect to 
origin alone. This means that there is one source of residual variation. In terms of experimental 
design, such an assumption is admissible when pair to pair variation is attributable to instru- 
mental error as in short-term experiments involving observations of the same subject before and 
after treatment. The expression short-term in this context carries with it the implication that 
no source of relevant individual variation obtrudes within the interval separating successive 
determinations. As a straightforward example of such a situation, we may consider a set of 
paired determinations of blood calcium level respectively carried out on different individuals 
immediately before injection of a fixed dose of parathyroid extract and half an hour later. If 
we view the issue within the traditional framework of the unique null hypothesis, our assumptions 
are then : | 


(a) that the scores for successive samples within the period stated differ in virtue of errors 
of observation only ; 


(5) this source of variation is common to all pairs of observations ; 

(c) the mean (true) value of the blood calcium level either before or after treatment is also 
variable from individual to individual in virtue of nature and/or nurture ; 

(d) the expected values of the blood calcium level before and after treatment are identical 
for one and the same individual. 
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A method of pairing commonly practised and commonly advocated has implications very 
different from the foregoing. As an example we may consider its use when the end in view is 
to decide whether addition of a dietetic component to the ration of a mixed population is bene- 
ficial, i.e. growth-promoting, an enquiry which typifies most situations in which the inclination 
to pair observations will obtrude in prophylactic or therapeutic trials. Here the observations 
paired are observations on different individuals chosen because they are alike w.r.t. age, sex, 
body-build, etc. In such circumstances we may have good reasons for believing that variance 
w.r.t. growth rate differs sensibly among individuals of opposite sex, of different age groups 
or of dissimilar physique. If so, the d-score distribution varies from row-stratum to row-stratum, 
and the postulates of neither Model I nor Model III hold good. In any case, it will be difficult 
to justify our confidence that they do so without recourse to ad hoc empirical data. An experiment 
of Cushny cited by Gossett in his original publication on the ¿-distribution is a somewhat 
unfortunate example of before-and-after-treatment pairing of observations. Here the object 
is to compare two optical isomers of a soporific drug by successive administration to the same 
individual ; but the interval between the observations is inescapably protracted and presumptive 
major sources of variation arise less from liability to error of observation than from change of 
physical conditions. 


CHAPTER IS 


SAMPLING: DISTRIBU tate 


15.01 THe FUNDAMENTAL DISTRIBUTIONS 


IN a wide range of statistical problems our concern is to explore the implications of the assump- 
tion that parameters estimated from samples are consistent with the null hypothesis that the latter 
come from the same universe or universes of identical structure. Such indices include 
sample means, sample variances, variances of sample means, the ratio of the sample mean to 
the sample variance, and so forth. What conclusions we can legitimately make w.r.t. their 
consistency presupposes knowledge of their distribution. If we can find an exact expression for 
the distribution of a parameter estimate, it will usually be possible to construct a table of its 
integral for ready reference with a view to performing a test of significance ; and expressions 
for a wide range of such sample distributions are in fact deducible, if we assume that the normal 
is the u.s.d. They embrace all those which the school of R. A. Fisher invokes to test the 
significance of estimates of variance dealt with in Chapter 13, and of regression in Chapter 17. 

Indeed, all the significance tests we shall subsequently explore rest on the assumption that 
the putative common universe of the null hypothesis is normal. We have already had occasion 
to remind ourselves that this postulate is at best a good approximation. So it is not necessary 
to emphasise that it is a convenient fiction. In the derivations which follow we may assume its 
truth regardless of its relevance to reality ; and confine our attention to issues which are alge- 
braical rather than factual. From the algebraic viewpoint, each significance test which will 
subsequently engage our attention is referable to one or other of a family of curves 
extensively investigated for the first time by Karl Pearson. 

The common pattern* which Pearson disclosed has little relevance to the end we here have 
in view, and the numbers he attached to the types themselves throw no light on their place in 
modern sampling theory. What is more important from our viewpoint is: (a) the relation 
in which they stand to the normal curve, as indicated symbolically in Fig. 109 ; (b) the fact that 
it is possible to define their properties uniquely in terms of the first 4 moments. In so far as our 
concern is with their role in sampling theory, we may summarise the familial relationship of the 
relevant types as follows : 


(i) With appropriate choice of constants, Type III describes the distribution of the sum 
of n independent square normal scores of zero mean, and is therefore of special relevance 
to the specification of the distribution of sampling variance, as set forth in 16.01-16.03 ; 

(ii) Type VI describes the distribution of the ratio of two independent Type III variates, 
and is therefore of special relevance to the construction of a test (the F-test) for con- 
sistency between independent estimates of variance ; 

(11) Type VII (on which the t-test mentioned in Chapter 7 of Vol. I relies) describes the 
distribution of the square root of a particular Type VI variate, and therefore stands in 
the same relation to the latter as does the normal distribution to the simplest case 
(Chi-Square for 1 d.f.) of Type III; 

(iv) Type I describes the ratio of a Type Tl variate A to the sum of two independent Type 
IIT variates A and B. It is a good descriptive curve for the distribution of the raw 
score of large samples drawn from a 2-class universe without replacement ; 


* Pearson developed the equation of a distribution embracing all his Types as special cases from consideration of 
sampling without replacement from a finite 2-class universe, and placed Type I at the head of the list for that reason. 
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(v) Type II is merely the symmetrical form Type I assumes as a particular case and is a 
good descriptive curve for the sample distribution of the product moment coefficient 
r from a bivariate normal universe, when its true value is zero. 


From the foregoing remarks it is evident that the kingpin of the system is Type III of which 
the Chi-Square distribution is a special case. We have already foreshadowed its relation to the 
normal distribution in 14.04. The remaining types are derivable, if we can express the dis- 
tribution of one score which is a function of another in terms of the distribution of the latter. 
This will be our preliminary task in 15.03. First, however, we may usefully recall and elaborate 
previous remarks on what we mean by a variate. 
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Fic. 109. The Family Tree of the Pearson System. (For the rectangular case, n= 1.) 


When we speak of a sample distribution, we presuppose the existence of a set of scores 
(univariate distribution) or of more than one such set (multivariate distribution) which constitute 
what we customarily speak of as the independent variable or variables. In this and the next 
chapter our concern will be only with univariate distributions, in which case what we refer to 
generically as the distribution itself is an expression connecting a particular value (x) of the 
score (variate) with a particular value of a variable y = F(x) (probability density) denoting the 
expected frequency of score values within a specifiable range including x itself. Thus y is what 
we ordinarily call a dependent variable. One reason for using the terms variate and probability 
density is that the words dependent and independent do not have the same meaning in statistics 
as in co-ordinate geometry or algebra. 

When we are talking about the real world, it is highly relevant to be clear about whether our 
scores increase by finite steps (discrete distribution) or otherwise. If they do so, a continuous 
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curve can more or less adequately for practical purposes describe what we can legitimately 
deduce about their frequencies from the theory of probability ; but it cannot do so exactly. 
Descriptive curve fitting of this sort takes its origin from the discoveries of de Moivre, D. 
Bernoulli and other eighteenth-century mathematicians who showed that the normal is a good 
descriptive curve for the exact r-fold sample distribution of the raw-score from a 2-class universe, 
as set forth in Chapter 3 of Vol. I. By hypothesis, the raw-score increases by unit steps, and 
we may then speak with propriety of the ordinate y as that of a frequency curve. Otherwise, as 
we shall now see, the expression is misleading. 


15.02 PROBABILITY DENSITY 


Probability density,* the dependent variable of a sampling distribution, is merely another name 
for the ordinate of the corresponding descriptive curve and the integrand of its area. Thus we 
should properly speak of the equation which specifies the relation between the ordinate y and 
the variate x as a p.d. equation. To get the distinction between frequency and probability 
density into focus, let us now iecall the build-up of the histogram of a sample distribution. 

The usefulness of the histogram as a visual device resides solely in the fact that it identifies 
the frequency of a score value or range of score values with an area, and hence clarifies the 
implication of using the integral of a continuous distribution as an approximate means of evaluat- 
ing the expectation of a score value within a specified range. Let us recall what we have already 
learned about it : 


(1) We label the mid-point of the base of each column by a particular score value x which 
increases by equal increments Ax, so that its lower and upper boundaries lie respectively 
at (x — ¿Ax) and (x + $Ax). 

(ii) The area AA, of a single column, which is the product of its height F(x) and its base 
Ax, specifies the numerical value of the frequency of the corresponding score x, and 
it is not inconsistent with the fact that the function is discrete to speak of this as the 
probability that the score value will be in the range (x + 3Ax). 


(iii) Since frequency here signifies proportionate frequency, the total area of the histogram 
is unity ; and if the score x has values from — ato + b 
a=-+b z= -+b 


> F(alhe= l= 4 AOS 0) 

(iv) If we can find an expression f(x) defining a continuous curve which passes nearly 
through the mid-points at the upper extremity of each column of the histogram, 
having the property that the total area bounded thereby for positive values of x is 
unity, we may write F(x) ~ f(x) for the value of the ordinate at x; and for the 
approximate value of the expectation E( + a) that x will lie in the range — a to + a 


inclusive 
at+4 = 


E(+ a) ~| f(x)dx i ; : ee 
—(a+#) 
(v) When Ax = 1, i.e. when a discrete score increases by unit steps, F(x) = F(x)Ax and 
there is no need to distinguish between the frequency AA, of the unique score value 
x in the range x + ¿Ax and the ordinate F(x) ~ f(x) which we shall henceforth call 
the probability density of the distribution. 
* Contemporary usage prescribes the term p.d. only for the ordinate of a continuous variate. This restriction is 


somewhat arbitrary, if only because the continuous variate is itself a fiction and the distinction we emphasise in this 
context is easier to grasp if we approach it against the background of an actual and discrete distribution. 
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(vi) When a discrete score u does not increase by unit steps, we can still visualise 1ts unique 
frequency in the range u + ¿Au as the area AA, of a column of a histogram of height 
F(u) on base Au, if we postulate that the total area of the histogram is unity. For a 


range of score values from — a to + b, we must then write 
u= +b u= +b 


== > 24... We 0...) 


u=-—a u=—a 
If so, the ordinate F(u) of the column whose area defines the frequency of a unique 
u-score value in the interval u + ¿Au is not identical with its frequency. 
(vii) To define the frequency of a score value explicitly the appropriate expression must 
-= therefore include as a factor the increment by which it increases. If we postulate 
that it increases continuously from a to b, (111) will thus take the form 


dA = f(u)du and | Cs ae a) 


When we say that f(u), a continuous function of a score u, is its probability density, 
we thus mean that y = f(u) defines the ordinate of a smooth curve such that f(u)du 
is the probability that the value of the score itself will lie in the interval u + ¿du. 

We may sum up what has gone before as follows : 

(a) The definition of probability implies that the sum of the frequencies of all score values 
of a distribution is unity. If a distribution is continuous, the number of score values 
is infinite ; but we can visualise the frequency of score values within a specified range 
as an area on the understanding that the total area under the curve is unity. 

(b) This is consistent with the representation of the frequency of a discrete x-score which 
increases by unit steps as the ordinate F(x) of a histogram column of unit base (Ax = 1), 
because the area F(x)Ax of the column is then equal to the ordinate and the total area 
of the histogram is equal to the sum of the frequencies, i.e. to unity. 

(c) If a discrete u-score increases by steps Au greater or less than unity, it has only one 
value both greater than u — Au and less than u + Au, and therefore only one value in 
the range u + 4Au. We can then represent its frequency in the range u + ¿Au by 
the rectangular area F(u)Au of a column on base Au and of height F(u) so defined that 
the sum of all such areas is unity. 

(d) If the score distribution is continuous, we conceive of F(u)du as an indefinitely small 
element of area under a curve of unit total area, and accordingly define as (Fu) . du the 
probability that u lies within the range u + du. So defined, F(u) is the ordinate of 
the p.d. curve whose equation is y = F(u). 

Since our first approach to a continuous sampling distribution is the derivation of the normal 
curve, it is of special importance to be clear about what we mean by a normal variate. We 
can derive the normal curve as the limiting contour of the histogram of (q + p) when r is 
indefinitely large. If so, we can express the frequency F(X)AX of the deviation X of the raw- 
score (x) from its mean M, = rp by the approximate relation : 

nS 
F(X)AX œ (27V,)* . exp E =| AX . {¥) 
In this case F(X)(=Y,), the ordinate of the curve for the appropriate value of X, is also its 
approximate frequency. If fẹ is the frequency of the raw-score deviation (X = x — rp) we 
may therefore write 


es e , 
Iz ay exp (- wT) E : - a E (v1) 
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This is so because X increases by unit steps from — M, = — rp to (r — M,) = 7q as x itself 
increases by unit steps from 0 to r. The same successive frequencies which define x = 0, 1, 
2... r also specify proportionate score values : 


The deviation (U = u — p) of the proportionate score (u) from its mean value (p) thus increases 
by steps AU =r -from — p tog- If we substitute AU =r, 'rU = X and *V, == Y, 
on the right of (vi) above, it becomes 
U2 
fz ~= (277°V,,)-+* exp (- 57) 
The frequency (fy) of the raw-score deviation (X = rU) is identical with that (fy) of the 
corresponding proportionate score deviation (U) so that 


2 
fo ~= (27r? V ,)> exp (- 7) ; : ; . (vä) 
The form of (vii) is not identical with that of (vi), but we may disclose the sense in which we can 
properly say that the normal is a good. descriptive function for both variates (X and U) when we 
express fy in terms of the ordinate ( Y y) and base (A U) of the corresponding column of the histo- 
gram for the U-score distribution, i.e. 


Yy. Au = fy == pe . Yy. 
Whence, from (vii) 


2 
Y y = (22V) exp (- 7) ; ; ; ; . (vii) 

The last equation is formally identical with (v), i.e. F(U) = F(X). Thus the frequency 
equations of 2 nearly normal variates need not be formally identical. We speak of a variate 
A of zero mean as normal if the ordinate (p.d.) equation is normal, i.e. 

A? 
F (4)= (27 V ¿)* exp (- 7) 

Similarly, we label a variate as a function of any type by the name of the function which defines 
the p.d. Thus we speak of x as a Gamma variate when we mean that the ordinate y of the 
distribution is expressible as the integrand of a Gamma function (see p. 246). ‘This is at first 
confusing, since it is an inversion of the more common practice of attaching a verbal description 
to the dependent variable, e.g. y is a parabolic function of x if y = Ax? + B. 

The preceding comparison between the build-up of the histogram for the raw-score and 
that of the proportionate score of the r-fold sample distribution from a 2-class universe which is 
both infinite and discrete offers a clue to the problem of defining the p.d. f(U) of a function 
U = ¢(X) of a continuous variate X when we know F(X), the p.d. of X itself. If there is only 
one value of U for each value of X and vice versa, the rule for a discrete variate is implicit in the 
build up of the frequency histogram, wz. : 


FIRIAX =f a NN 


- O) = oo 
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s HISTOGRAM of SQUARE SCORE DEVIATION 
of a BINOMIAL VARIATE ‘(+z 
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2 
Xx. =6T = 9 E 25 36 49 
Q=X?: O 4 9 16 25 36 49 
Frequency 0.1964 0-3492 0-2442 O-1332 O-0555 0-017! ©0036 0-0005 


ISOMETRIC HISTOGRAM ditto 
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Q = 
FQAQZA Aa 
x*=Q: O 4 9 16 25 36 49 
AQ: i i 5 5 9 9 13 13 
F(Q): 0-1964 0:3492 0:0489 0-0267 0-:0062 00019 0:0003 0:0000 
F(Q)AQ 0-1964 O-3492 0-2442 O-1332 0-0555 0-0171 0:0036 0:0005 


Fic. 110. Histogram of the Raw-Score deviation of the 16-fold sample from a symmetrical (p = 4 = q) 2-class 
universe. 


In the limit, this is equivalent to the following simple rule when the condition last stated* holds 
good, vz. : 


Fo =KU) ee NS 


* The reader will probably recognise (x) as the principle implicit in the familiar trick of integration by substitution ; 
but few elementary textbooks on the infinitesimal calculus emphasise or illustrate, if indeed they mention, that the 
substitution is not always permissible in precisely this form. 
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In the foregoing example, 


U = 4(X) = >. 


Thus one value of X corresponds to one value of U and vice versa throughout the whole 
score range negative and positive. It will be easy to see that this condition is highly relevant 
if we set ourselves the task of visualising the distribution of the square raw-score deviation 
O = (1-X)? of the r-fold sample from the infinite 2-class universe. 

If p = 4 = q and r = 16, as in Fig. 110, X has zero mean and 17 discrete values 0, + 1, 
+2, +3 +44, +5, +6, +7, +8 with frequencies defined by appropriate terms of (4 + 4)**, 
viz., 16,9, . 2-18, 16,9, . 272%, 16,49, .2* . . . 16,44, .2*. O has 9 discrete values all positive 
0, 1, 4, 9, 16, 25, 36, 49, 64 with frequencies defined by 16,5, . 2-16, 2.16 io) . 2718, 2. 16,49) . 271° 
e. 216,10). 2. Thus the frequency of the score X =— 31s 16,,, 2% 000665: = 16,,,.2 °° 
which is also the frequency of the score X = + 3; and the frequency of the score O = 9 = (+3)? 
is therefore 2(0-0666) = 0-1332. We can represent the frequency of each O score other than 
Q = 0 as a rectangular column on unit base, if we make the height equal to 2F(X), since 
O =(3X)? When O = 0, the appropriate height will be F(O). Such a procedure will leave 
increasing gaps between successive columns from O = 1 onwards as in the upper half of Fig. 
111. If we were to draw a smooth curve closely following the contour of the frequency polygon 
formed by joining the mid-points at the head of each column, the area of any segment except in 
the interval O = 0 to O = 1 would therefore include that of empty spaces, and would greatly 
exceed the sum of the frequencies of score values in the range cut off by it. 

It is possible to make a histogram of the O-score distribution unzformly dense (i.e. without 
gaps), if we abandon the luxury of making the columns of equal width (AQ) without relinquishing 
the two fundamental conventions of the histogram of the X-score distribution, viz.: (a) the 
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Fic. 111. Visualisation of the Square Score whose distribution is in Fig. 110. 


SAMPLING DISTRIBUTIONS 641 


mid-point of the base of each column specifies an actual value of Q ; (b) the area of the column 
specifies the frequency of the corresponding Q-score. 

The fact that the first two O scores (Q = 0 and 1) are consecutive dictates how we space the 
boundaries of the columns in conformity with (a). The boundaries of the columns representing 
these two values are — 4 to + ¿and + 4 to + 14. Thus the half width of the next column 
(O = 4) will be (4 — 14) = 24 so that AQ, = 5. The base of this column ends at (4 + 25) = 63 
and the half width of the next column (O = 9) will be (9 — 63) = 24, so that AQ, = 5. By 
the same token, the half width of the next column (O = 16) will be (16 — 115) = 43, so that 
AQ,. = 9 and so on. 

Having spaced our columns so that the boundaries of each are equidistant from the point 
which marks the corresponding Q-score on the base line, we can fulfil our second condition 
by defining the height f(Q) of each column in terms of F(X) accordingly. ‘Thus f(0) := F(Q), 
KAD = 2FQ), NAHAQ; = 2F (2), IOA; = 2F(3), etc. For instance, 

f(16)AQ,, = 2F(4) = 0:0556, 
. f(16) = $(0-0556) = 0-0062. 

The values of AQ obtained in this way assume a more suggestive aspect if we place them 

side by side with the corresponding values of X : 


+X=0 1 E O eee 6 7 8 
AD =:1 1 peo: OE, 13 13 17 
If we now put the mean values of AQ under the mean value of successive patrs of X-scores we get : 
es ee ee 
AG 1 3 5 7 9 11 13 15. 
We now note that AQ,, = 2X m, just as | 
dQ 
HE = 2X. 
dX 
Such is the build up of the isometric histogram in the lower half of Fig. 111. Let us now ask 
what it suggests. By removing the restrictions that AQ must have a fixed value we have been 
able to eliminate empty spaces between columns, thus leaving open the possibility of expressing 
the sum of successive O-score frequencies in terms of the area of a segment of a suitable fitting 
curve. Except when O = 0, the following relation then expresses the height of the columns 
of the parent and the derived histogram : 


2F(X)AX = f(Q)AQ, 


AX 
+ AQ) = FX 
The outcome suggests the following identity in the limit 
dX 
f(Q) =2HX ‘10 (xi) 


This expression is not identical with (ix), because each value of Q corresponds to two values 
of X (other than X = 0), and the frequency of any value of Q (other than zero) is therefore the 
sum of 2 X-score frequencies, in this case equal because F(X) is a symmetrical function of X 
itself, being a term of the expansion of (4 + 4). Tf this is not so, as when the definitive binomial 
of the distribution is (2 + 4)”, we should write | 


O a a” 
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In the limit this is equivalent to 
dX dX 
= F(— X)—~ — . 
FO) = Fi- Xio + TONO 


The above defines the p.d. of any continuous variate Q in terms of the p.d. of X when X admits 
numerically equal negative and positive values. The converse case is of equal interest. We now 
suppose that : (a) we know the p.d. F(x) of a score x whose values lie only in the positive range ; 
(b) a score w has numerically equal positive and negative values for each value of x, as when 
w = + x, so that 


FAX) (xiii) 


fl— 0) +A w) = F see. (xiv) 


If we have sufficient reason for assuming that f(w) is a symmetrical function of w, we may then 
put 


15.08 FUNCTIONAL TRANSFORMATIONS OF SAMPLE DISTRIBUTIONS 


In 15.01 we have had occasion to remark that 
(a) the significance tests in most common use pre-suppose sampling in a normal universe ; 


(b) each is referable to one or other of a system of curves first investigated from a purely 
formal viewpoint by Karl Pearson ; 


(c) it is possible to exhibit the derivation of any one of them by recourse to some functional 
score transformation of the sort we have now explored in a preliminary way or to 
considerations based on the theory of sampling from a parent universe whose u.s.d. 
we know. 


In 15.02 we have distinguished between 3 situations which arise when we seek an expression 
for the p.d. f(z) of one variate (z) which is a specifiable function d(x) of a second variate x, 
and we also know F(x), the p.d. of x itself. We shall now use as illustrations variates 
assignable to the Pearson system. 

Case I. The variate z = kx is a multiple or submultiple of the variate x so that the two 
distributions differ only w.r.t. scale. This includes the first example cited in 15.02 and satisfies 
the condition appropriate to (x) therein, since z has only one value for each value of x and vice 
versa. The rule is 


fla) = Fy. 


Since zg = ka when x = a and z = kb when x = b (> a), the corresponding integrals are 


|. f(z)dz = | F(x)dx. 


In this case the differential coefficient is a simple scalar factor, 1.e. 
ok ee 
dz 


i fle) = 7 Fla) = KFR. 2) E. te E 
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Example (1).—The standard score is c = (X — a) and 


1 xX? 
F(X) = exp (- 23) 


oV 2 


In this case k=1 = o and (i) becomes 
KE) = oF(oc) = (20) +e-+*. 

Case II. The variate z is a single-valued monotonic function of x decreasing as x increases. 
This case also falls into line with (x) of 15.02, since z has one value for each value of x and vice versa. 
To make clear what we are doing when we use the tabulated functions to specify a segment of 
area, we shall assume that z = a when x = c and z = b when x = d : 


Example (2).—Let x be a score whose p.d. is that of a Beta variate of unlimited range, i.e. Pearson’s 
Type VI, defined by 


1 xj-1 


(x) BRA Ur (ii) 
We wish to determine f(z) the p.d. of a score defined by 


As before, F(x)dx is numerically equal to f(z)dz ; but the fact that x increases while 2 diminishes means 
that c > dif a < b, since b = (1 + d) and a = (1 + cc). This means that for all values of d or c 
and corresponding values of 6 and a, 


y a = | “Fads. 


To interpret the sign of the integral correctly we have therefore to put 


— FOZ = fe). 


In accordance with (111) 
E (122 : 


? 


Ataa aad «x! al 


Whence we have 
dx —gk-l(] — z)i-1 
F (x) . FF = sor S E 
We may therefore write 
aea — gya 


fle) = "3 (wv) 
This is an important result, since f(z) is a Beta variate of unit range, being a case of Pearson's Type I. 


Case III. The parent score x has a symmetrical distribution with two values corresponding 
to one value of u, as when u = (+ x)?. We shall assume that the range of x is from — œ to 
+ œ, so that the range of u is from 0 to œ. Thus x will lie in the range from — d to — c 
and from + c to + d when u lies in the range from a= (+ c)? to b = (+ d). This is the 
situation to which (xiii) of 15.02 applies. For any values of c and d with corresponding values 
of a and b, 


| Pegas eN | F(x)dx = |. flu)du. 
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If F(x) is a symmetrical function of x, (xv) of 15.02 holds good, i.e. 
d b 

2| Finjdx = | F(u)du. 

When u = x2, so that x = u? 
d. d. 
= — lu> and 2F (x)= — u+ F(x), 

+, f(u) = ut F(x) ; . (v) 

Example (3).—To find the p.d. of a square normally distributed score deviation. By definition 


1 X? ; 
F(X)= 2,0 exp OV and u = A , 
TT 


1 ae dX 
F(X = F exp or > ) and > ie lu>, 
TT 


dX 1 — u 
AA A e ae — },y4-t = . . . . i 
O aap or (r) a= i) 
The particular case where V = 1 is of special interest. If c is the standard score (normal score of unit 
variance) and C = c?, 


F( c) ae aa’ ent? — (3) eta 


V27 TG) 
a O on 
O 7 rg e 
O De asoa |, a 


The last equation is of fundamental importance in sampling theory being the special case of Pearson’s 
Type III known as the distribution of Chi-Square for 1 degree of freedom. It enables us to define the 
expectation of a value of the square standard score not exceeding C = h by the relation 


1 fh 
E(C<h = = | cata. A fh 
( ) oa | (viii) 
This is, of course, equivalent to the expectation that c* will not lie outside the range + Vh, i.e. 
= Vh 
— Vic Cw = | ¿Po de, 
A a awk 
For what follows we shall need to know something about the properties of the curve defined by (vii). 
Accordingly, we determine 


ME ae oa 
TIO- r r Cota ie 


When the first derivative is zero, we therefore have 

C= — 1. 
Thus f(c) has no turning point within its prescribed range. It decreases continuously as C varies from 
0 to œ. 
* It is now common to cite in tabular form the probability that Chi-Square for 1 d.f. will not exceed a particular value. 
It is therefore pertinent to remark that such a table conveys no information we cannot derive from the table of the 


normal function. ‘Thus the probability that a value of Chi-Square (C) will not exceed 9 is simply the probability that 
the numerical value of the standard score c = (x — o) will not be outside the range + 3. 
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Case IV. We now suppose that u has two values for each value of a parent variate x con- 


fining ourselves to the special case u = + Vx. For real values of u that of x must be positive 
and we assume that its range is from 0 to 00, so that the range of u is from — œ to + œ, whence 
this is the case covered by (xiv) of 15.02, 1.e. 


a —a b 
| Pigs | f(u)du + | f(ujdu. 
c =p a 
If we know that f(u) is a symmetrical function of u, (xv) of 15.02 holds good, i.e. 
d b 
| 1F(x)dx = | f(u)du. 
When u = + Vx so that x = u? | 
dx 
o) mos PR), 
= fin) =u. F(x) =x FO ‘ ; ee 


Example (4).—We may reverse the procedure of the last example to obtain from (viii) the normal 
distribution of unit variance as that of the square root of the Chi-Square variate (C = c?) for 1 degree of 
freedom. In accordance with (ix) 

Jo) ee POS. 
In this expression 


e 
27 


F(C) = P Ponen 
TT 


1 
. e. F(C) = —e + = f(c). 
azo 
Example (5).—An analogous transformation involves the relation between Pearson’s Types VI and 
VII. We define the former in the range x = 0 to x = œ as the p.d. of a score x such that for positive 


values of j and k, 


1 xi —1 


F(x) = = ¡n= | 
(x) BG.) Go (x) 
A case of special interest arises when 7 = 3, so that 
$ 
O 


B(}, k) (1 + x) 
For reasons which we shall mention later this describes the distribution of a statistic whose square root 
(u = Vx) has a symmetrical distribution. Whence, in virtue of (ix), 
1 ux—* 
. F(x) = = . — 
+ PO) ag Ota 
By substituting x = u?, we thus obtain 
1 . 
Ja) = Fa a i i i > a EA) 
This is Pearson’s Type VII. By hypothesis the range of u in (xii) is from — oo to + œ and f(u) is 
a symmetrical function being equal for numerically identical positive and negative values of u. Hence 
the mean value of u is zero. By differentiating we obtain 
kh) 2i 
D -Juy = .—. 
j ) Bi, k) (1 ae u?) te 
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Hence D, . f(u) = 0 when u = 0. Thus the distribution has, like the normal, one mode coincident with 
the mean and extends asymptotically to the u-axis at infinity in both directions. ‘The parent distribution 
whose p.d. is F(x) is a monotonic decreasing function of x from 0 to oo when 7 = 3, as we see if we 
differentiate (x), i.e. 


D; . F(x) = aA 1)xi-2(1 + x)=- — (k + jai + x) hi1, 
Whence D, . F(x) = 0, if 
PO e 
E 
Thus there will be a turning point in the positive range x =0 to x= œ only ifj > 1, and the 
distribution is then unimodal. 


x 


15.04 THE PEARSON SYSTEM 


In Chapter 6 of Vol. I we have had a preview of a system of curves suitable for describing 
sampling distributions. The important types of the Pearson system are the incomplete Gamma 
functions (Type III), incomplete Beta functions (Types I, II and VI) and one derivable from 
the latter (Type VII). The special interest of these functions arises partly from the fact that 
they emerge as approximate expressions for the binomial and the hypergeometric distributions 
when the sample is large, but more especially for a reason stated in Chapter Gli x is the 
independent variable of the incomplete Gamma or Beta function, the integrand contains a power 
of x as a factor. Hence the zero moments of a variate so defined are of the same form as the 
complete function; and are easily specifiable. It is thus possible to define all the relevant 
constants of such a curve in terms of the moments of a distribution with a view to exploring 
its use asa satisfactory fitting curve. In Chapter 6 of Vol. I, we have examined only the properties 
of Types I-III. In this chapter we shall first recall their properties and then deal with Types 


VI-VII. 
Pearson’s Type III. The complete Gamma function is the integral ( being positive) 


00 
| Ta a dx =e 


0 
The fundamental property of T(n) is (p. 246, Vol. 1) 

Pat Dn o =e 

e T(n +2) = (n + 1) T(n + 1) = (n + 1)® T(n); 

T(n + 3) = (n + 2) T(n + 2) = (n + 2)® I (n). 


Whence by iteration, we see that 


(1) 


Im+r)=(n+r-=1D"I(M . : ; i eo) 
By the same token, we may write 
eee 
T(n — 1) = = 


Pa DN 
I(n — 2) = AD 
Whence also by iteration : 


(iii) 
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— PEARSON TYPE IT 


ee eee UR 
\ 


Fic. 112. Chi-Square Distributions. "The Chi-Square variate of f degrees of freedom is the Type III variate when 
(as here) k = 4 (or j = $) in Fig. 109 and n or (m) = 2f. Note that Chi-Square for 2 d.f. (n = 1) is monotonic, but 
Chi-Square for 4 d.f. (n = 2) has a mode in the positive range of the variate. 


In accordance with (i), T(n) = (n — 1)! when 7 is a positive integer, so that 
EQ e Be a Ot = FI). 
If n is a negative integer T(n) = + œ. When n = 3, I (n) = W/m, whence from (i) 
TB) =43V7; TG) =4v7; 
rd) =WV7; TQ) =YEvr wee iv) 


It is convenient to express the complete Gamma function in the more general form 


| ty — 1) i : -o0 
o k 


k” he e kt yn—1q 1 
LI'(n)Jo 


Accordingly we define a Gamma variate of scale k and exponent (n — 1) as a score x whose prob- 
ability density is given by 
pro ttyn 1 


A o eA) 
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This is the p.d. equation of Pearson’s Type III in the range x =0 to x = œ. By definition, 
the rth zero moment is given by 


eat 
a eee ~ keyn tri 
is mal. o | dx, 


ET) 
Mr = T(n) pn ogee 
Hence from (ii) 
pr = Kn + r — 1)” ; i : : . (vii) 


In the same way we may derive the negative zero moments defined by (vi) in 11.03, viz. : 


fon po 
ge arl dy 


r 
E O 
ig es 
Whence from (iii) above 
R 


=r = (n — 1) . (vill) 
From (vii) and (viii) it is possible to obtain the moments of the distribution of a score z which is 
the ratio of two independent sample scores u and v, each a Gamma variate, i.e. 


ee : : i i ves Era 

- (ix 

Our interest in this function arises from the fact that an important class of significance tests 

depends on the distribution of the ratio of sums of squares, and that the distribution’ of a sum 

of square normally distributed scores is of Type III. For that reason we need consider only 
the case of two variates u and v with the same scalar constant &, i.e. 


AAA yon hr. 
gv) O) E 


If u and v are independent variates, we may employ (vii) of 11.03 to obtain the moments of z, 
Le. | 3 
Pha) = wt). Modo) ee. (x) 
From (vit) and (viii) we obtain 
peu) = R="(m + r — 1)”; 
R" 
H- (v) = @ — 1 


Whence by substitution in (x), we get a result which we shall later see to be important : 
(5 —(m+r—1) ; 
es ae T : : ; s EE? 


From (vii), we derive the following values of the zero moments of the Gamma function : 


n (n ae po (n + 238 (n de 3/0 
> yg E e e S. 


y: eee 


My e k ) k2 ; 3 k3 ht 
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We now recall (xxx) in 11.02, viz. : 
Ma = Pg — pa”; 
My = p3 — Sta» pr + 27; 


My = pa — Aspa + Spa | pa? — Spy”. 
Whence we derive 


My MER NR n= Sn(n + DRA 
4 6 E 
+ PA OU TP) ae ae i 10) 


From (xii) we see that £, suffices to define f,, and that f, > 3, and the distribution is both 
skew and leptokurtic. When n is very large, the first two Pearson coefficients do not sensibly 
differ from those of the normal curve. By differentiating to obtain the maximum value of f(x), 
we can obtain the distance of the mode from the origin : 


Pas e x"! ke #2 , nl, 
Whence D.f(x) = 0 when 


_ (xiii) 


The distance between the mode and mean is 


n— l n=1 n 

i “| E A A ; à . (xiv) 
For the distribution we have elsewhere (p. 644) called Chi-Square for 1 d.f., k = 4 = n and 
x in (xiii) is negative, i.e. there is no mode in the positive range, the curve being monotonic. 
The Type III distribution is indeed unimodal only if n> 1. From (xiv) we then see that the 
mode is to the left of the mean, the distribution being therefore most steep on the side nearest 
the origin. More generally, we speak of the Type III distribution defined by (vi) above as the 
Chi-Square distribution for 2n degrees of freedom when k = 3. Thus Chi-Square for 2 d.f. is 
also monotonic, since n = 1 and x = 0 in (xiii); but Chi-Square for 3 d.f. has a maximum value 
in the positive range, since n = $ and (n — 1) = 3. 

Evidently, the relation 8, = 3(1 + 48;) restricts the suitability of the Type III distribution 
as a good fitting curve for a distribution which is both skew and leptokurtic. Its special interest 
for our purpose arises from the circumstance that it includes the Chi-Square distributions as 
a special case when k = 4 ; and the importance of the latter resides in the fact that they describe 
the distribution of the square deviation (Chi-Square for 1 d.f.) of a normal variate and (as we shall 
later see) for the sum of f independent square normal score deviations of unit variance (Chi- 
Square for f degrees of freedom). Hence it is of fundamental relevance to statistical problems 
involving the distribution of variance estimates. 

Since f = 2n by definition, we write Type III in Chi-Square form as 


eye 


On ee DN 
laf) 
From (vii) we obtain the zero moments of the distribution as 


m=f; po =S(F+2)5 ps =SF + DOHA p =f + UF + AG + 6), ete. 
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Types I and II. The complete Beta function is expressible* alternatively as an integral 
of limited or unlimited range, viz. : 


| xP a) BR j ant (xv) 


off F ÓN 
The Beta function B(j, k) is expressible in terms of the Gamma function, viz. : 
14) LR) 
, k) = =. 
ea 


We may adapt as follows the left-hand expression of (xv) to define a variate of restricted range 
from 0 to a 


| la — ade = t*#-1BG (am) 


0 
Accordingly, we may define a Beta variate of restricted range by the equation 


xa — xP} +: 
f(x) = atl By E BG (xvii) 
Equation (xvii) defines Pearson’s Type I of which Type II is a special case when j = k, so that 


xica — x) e 
f (x) = EAN J (xviii) 
The general expression for the zero moments is 
= | gee" tay. de 
o Aa BG ye 
E NUO «GA 
E GEEF 


Whence the mean of the distribution is 


(xix) 


e 
Irk 


On differentiating f(x) in (xvii) we get 


(j— Maia — x} — (k— Dota — ao 
Phe HE BO, k) 
Hence D, . f(x) = 0 when 

aid) 
¡HR 
This defines a turning point (maximum) within the prescribed positive range if j and k, being 
greater than unity, are (as we here assume) of the same sign, viz. positive. The score value 
(x) so defined is then that at the mode and (x — p) is the distance of the mode from the mean, 
so that 
MN a a 
Pe oe ee 2) 


* j and k being positive. 
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ara 
6 y= = 5 x Ci x) j-1O kel 


2 J=2 k=5 


Fic. 113. Pearson’s Type I (often a good approximation for the hypergeometric distribution for sampling without 
replacement in a 2-class universe). 


Thus x > p, and the mode lies to the right of the mean if j > k, otherwise to the left of it. ‘The 
mode is the midpoint of the range and the distribution is symmetrical if x = 3a, i.e. if TER 
when 1, = 4a and the mean coincides with the mode. As we have seen, this condition defines 
Pearson’s Type II as a special case of Type I of which (xviii) is the definitive equation. From 
(xvi) we obtain the first four zero moments of ‘Type II as 


F aj == p” : aj $ go aj ob: 3) 


M = ER ES) eee E 


Accordingly, we derive 
ees a” * Ma = 0 Mi = 3a* 
A(Qji+ 1)? >. 162j+D(+8) 


Ma 


We thus obtain 
6 


B == 0 -and Porta ges (xx) 


RF 
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From (xx) it is evident that Type II describes a platykurtic distribution (8, < 3) for all positive 
values of 7; and that the values of the first two Pearson coefficients are identical with those of 
the normal when j is indefinitely large. We shall explore its relation to Type III on p. 658 
after we have examined the properties of Type VI. 

From (xix) we obtain in the usual way the following values for the mean moments of the 


more general (Type I) distribution : 
ES a?Rj sce 2a*ki(k — 7) 
A RNR o 
m, — SEGRLIR(G + + 2) + 2k — j)’ 
; ETETEA | 
TYPE 
f= Lx (1-07 
> Y= ——* X 
E B(jj) 


Ma 


O O2 04 +06 OS LO 


Fic. 114. Pearson’s Type II. 


Whence we have 
y IA GEO 
A 
gag OLR ++ 342) — (k — DAR SHIA 
Ti | kilk +j + 3” ; 


For positive values of k and j greater than unity, we have seen that Type I is a unimodal distri- 
bution. It may then be platykurtic, but will be leptokurtic if very skew. The condition that 
B2 > 3 is evidently 
(k $k ei UY > k 
Hk = gj 
a~ Patr rIt D> ifs 2). 


Thus, e.g., the curve is leptokurtic if q = 4 when f = 2 so that k = 8. As an exercise the student 
may profitably explore the condition that 6, = 3, and the condition (k = 1) that (xvii) defines 
a monotonic increasing function, the mode being then at the upper limit of the range. 

The Type VI distribution. Types I and II define a distribution of restricted range in 
virtue of the integral limits of (xvi), i.e. from O to a. We now recall the alternative definition 
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(p. 251, Vol. I) of the Beta function on the right of (xv), which we may write in a more 
general form as 


R” | oe (cli) 
Bim, n) jo (k + au | 
Accordingly, we may define a Beta variate of unrestricted range in the positive domain as 
n m-—1 
a (xxii) 


HO = Bon, k + a 
This is Pearson’s Type VI, the special interest of which emerges from a consideration 
of its moments. For the rth zero moment of such a distribution we have by definition 
E" 00 y +m-1 d 
Cr — Bm, n) | (p p ay 
To get the integral in the expression on the right into the same form as (xxi) we may put 
(r + m) = z, so that m = (z — r) and 
k” 00 al 
Er = Bim, n) | o (k F xe a 


¿A Bir + m,n — 1) 

~ B(m,n)' eo 

_ RT\m + 2) I(r + m(n — r) 
Mmm  K"IT(m-+n) 

= RT( + m)I'(n — r) 


(mn) : 
R'(m + r — 1)” > 
A (n ere y” (xxiii) 
ro 
o8 TYPE MI 
y = 7 A 
B(mn (+9"*" 
0-6 
m=lO n=5 
0-4 
02 
m=], n=3 
O 


O O2 O:4 0-6 Oe >: PO 12 1:4 16 18 
Fic. 115, Pearson's Type VI (Snedecor's F—see p. 700). 
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In particular when k = 1 
= (m Le does pe 
Fr E (n 8 p” a 
The last equation is identical with (xi), i.e. Type VI defines the distribution of a variate which 
is the ratio of two independent Chi-Square variates. 


In this case, the two Gamma variates of whose ratio Type VI describes the distribution 
have the same scalar constant. We may make (xi) more general if we write 


Be oe Se. ee a ee 
f(u)= ee co eee and f(v) = a o 
In this case 
a r q 
pu) =p *(m+r— 1)" and p_,(v) = ap" 


ale) Y Ser" o o 


This is equivalent to (xxiii) if we write in (xxiv) 


k=? 

P 
From our viewpoint the last remarks pinpoint what is important about Pearson’s Type VI; 
but we may profitably pause to comment on the general properties of the distribution defined 
by (xxii), if only because Pearson, to whose genius we owe the technique of curve-fitting by 
moments, elected to disguise his types in symbols best fitted to conceal from the student their 
most interesting properties. We have indeed exhibited in 6.08 of Vol. I the customary equation 
of Type III with origin at the mode, and hence identifiable only at the expense of some effort 
with the Chi-Square distributions. It is equally possible to conceal the fact that Type VI is 
merely the Beta variate of unrestricted range, if we shift the origin to the left of the point 
where the curve starts off by the substitution (k + x) = X and x = (X — k). If we then put 
(m + n) = p and (m — 1) = q, we may write (xxii) without any obvious advantage in the form 

k7?2—-141x1 

{oo e 
This is the form of Type VI commonly cited in standard works. 


What is more worthy of comment is the character of the distribution which (xxii) specifies. 
By definition the range is infinite. On differentiating once we obtain 


Xx E 


n 


k 2 -m-n m-—1 —m—n—1 
Do = ia” — 1)x™-2(k + x) — (m + n)x™-\(k + x) e 


On equating to zero in the usual way we obtain 
m — Ter =2 {mF A 
(k + ajo. (REY ajmin+D 
m — 1)k 
“2 
If m > 1, there is therefore a single mode in the positive domain which confines the variate. 


Subject to this condition, f(x) = 0 when x =0. If m < 1, the curve is J-shaped in the domain 
of positive values of x and the turning point lies outside the range of the variate. 
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The expression J-shaped in this context is current usage ; and it is not an altogether happy 
one, since the capital J has an upward bend at its extremity. Like the rectangular hyperbola, 
the curve under discussion tails off monotonically from a flying start. A J-case of particular 
interest arises when m = 4 and k = 1, so that 

ee (xxv) 
Y= BG ny (+ apt : 
This we shall see is the parent of Type VII. When the Type VI curve is unimodal in the 
positive domain, we may explore its shape by evaluating the first four moments. We may dis- 
regard the scalar constant by putting k = 1, in which case we obtain from (xxiii) 


E o ae 
A ee a oy" 
mm + 1(m + 2) - mm + 1)(m + 2)(m + 3) 


MG o —3)) we Den — Den — Hr — F) 


Hence we obtain the mean moments in the usual way : 


ee) mm. m ++ 1 — Wm +n — 2D), 
E 3) (n — Dn — 22 
no A on 20 — 8, 


(a — 1){n — 2) 
Thus the first two Pearson coefficients are 


B . Ke — Aaa 1 1)? 
* mín — 3) (m En — 1)” 


=D Y) 
ae a —8)\(n — 4) Y mm +n — Dn — 3n — zl 


For positive values of m and n each greater than unity, the distribution, as we have seen it 
to be, is unimodal. The second mean moment is infinite if n = 1 or n = 2, the same being 
true of the third if n = 1 or n = 2 or n = 3 and of the fourth if n = 1, n = 2, n = 3 or n = 4. 
For values of m greater than unity and of n greater than 4, the value of B, is greater than 3. 
For integral positive values of both m and n Type VI cannot therefore define a platykurtic 
distribution. 

Type VII. ‘The distribution last dealt with is that of the variance ratio of Chapter 16, and as 
such the basis of Snedecor’s F-test. ‘The ¢-distribution commonly called Student’s in conformity 
with the pen-name of its author, W. S. Gossett, is Pearson’s Type VII. To clarify its genesis, 
we may remind ourselves of the relation between the normal distribution of the c-score of unit 
variance and the distribution of its square (C = c?), i.e. Chi-Square for 1 d.f. We may express 
this relation by saying that the normal distribution of unit variance is that of the square root 
of a Chi-Square variate of 1 d.f., and might permissibly speak of the latter as the parent of the 
normal distribution. ‘The particular form Type VI assumes when k = 1 and m = 4 in (xxii) 
is the parent of Type VII in the same sense. The Type VI distribution then defines a variate 
“== 2% such that 

1 yoa 
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This is a monotonic decreasing function of x in the positive range of the distribution, like Chi- 
Square for 1 d.f. On the assumption that the distribution of z = x* is symmetrical, we have 
seen how to obtain the distribution of z from that of x in Example (v) of 15.03, as below. 


qa 
A oo 
-—B(4, ma + 29" tt de? 
1 ; 
> F(z) == BG, n + 23 . > ‘ : ‘ (xxvi) 


This is the definitive equation of the Type VII distribution which is evidently symmetrical about 
zero mean. Hence odd mean moments vanish, and we can define the even mean moments by 


a 9 | o. qa de 
wre BG, ado Ce eee 
TYPE VI 
SS TOSE 
Y= Bn) n+ 
O08 n=2 
0-4 
0:2 
~|-5 -1-O -0-5 O O-5 lO 1-5 


Fic. 116. Pearson's Type VIT (Student's t—see p. 655). 


We can get this integral into Type VI shape by the substitution 2? = u, so that 
was | e uE -de 
BG, mM) (1 +t? 


We now put (r — $) = (j — 1), so that 7 = (r + 3) and (k + 7) = (n + 3) so that k = (n — r), 
whence 


Mar 


SET oi ee) 
oes BA 
lr ee 
IS. 
o 
a End 


(n= 1) : i ; i : (eve) 
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Whence we obtain 
1 3 15 
= ———_ +; Ma =; M = ooo 
2(n — 1) 4(n — 1)(n — 2) 8(n — 1)(n — 2)(n — 3) 


Ms 
The Student or t-distribution is a particular case of (xxvi) involving a scalar change, wz. : 
a an 
= A an = An. 
aay 


Whence by substitution in (xxvi) 
1 1 


Pit) ee ae a ET, (xxviii) 
E Bia; af) (1 dE =) 
For the moments we therefore write 
malz) = ma or mat) =f" . maz). 
Hence from (xxvii) 
Js . (r p go 
me) FY 
Whence we obtain 
a A da era I 
fea et -aA 
‘Thus we get | 
8G — 9) E | 6 | = 
a a SS ee O aem 
a Se e 30(3f — 10) 
A US De 


If f > 4 the curve described by (xxviii) is leptokurtic, since 8 > 3. When f = 2 the variance 
is infinite. 

It is also evident that the fourth mean moment is infinite if either f = 2 or f = 4. Of more 
interest is how closely it approaches the normal when f is large. From the preceding formulae 
we derive 


f Ba Ba Pe 

16 3:5 24:5 300-1 
22 3-3 20-8 208-0 
40 3-16 17-7 147-1 
60 3-1 16-7 130-3 
Normal 3:0 15-0 105:0 


At the 2o level the correspondence between the normal and the ¢-distribution is very close when 


f >50. 
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Relation of the Type VII and Type I distributions. We may now define as follows a variate 
(w) which exhibits the genetic relationship between Types I and VI, hence also between Types I 
or II and Type III, as in Fig. 109 : 
1 v 


ES aes ey 


wW 


We can determine the p.d. equation for the score w, if we know that of z, which we may assume 
to be a Type VI variate. The derivation proceeds as in Example (2) of 15.03, viz. : 


F ] gi —1 
BGR rar 
we-1(1 — w)i-1 
flw) = — (xxxi 
BG, i 
The variate z = (u + v) is itself a Type VI variate if u and v are independent Type III variates. 
If u and v are Type III variates with equal scalar constants k = j, as is true of the Chi-Square 


class, the last expression reduces to the Type II form 


wi-l(] coer ji 
q Sa es 
r B(j, j) 
We shall later see that this defines the distribution of the correlation ratio (7) in a homogeneous 
normal universe (p. 706). 


(xxxii) 


EXERCISE 15.04 


1. Show that the ratio (x = A — B) of two Chi-Square variates, A of a degrees of freedom and B 
of b degrees of freedom is a Type VI of the form 


xla — 2) 
1) = Ba, TO aero 


4b 
b uła — 2) 
a 


b la +b) 
Ba, w)(2 +») 
3. If z is the reciprocal of u in 2 above, employ the method of 15.02 to show that 


ta 
IN 402) 
b 


a 3(a + b) 
BU, 20(; +=) 


2. Ifu= ax — b in the above, show that 


fu) = 


fz) = 


4. Show that the first four zero moments of the Chi-Square variate for f degrees of freedom are 
=f; po =f(f+ 2); 
A wa =S + MF + 4 +6). 
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15.05 THE TABULATION OF THE GAMMA FUNCTION 


The mere fact that the incomplete Gamma and Beta functions are suitable to describe unimodal 
distributions whose zero moments are always expressible in terms of the complete Gamma 
function would not justify their prominent role in statistical theory, if it were not also true that 
they satisfy the practical demands of ready tabulation; and we have hitherto (6.05, Vol. I) 
indicated the numerical evaluation of T(n) only when n is a positive integer or an odd integer 
multiple of +, negative or positive. The aim of Chance and Choice is to put the playing cards 
of statistics face upwards on the classroom table; and we should fail to fulfil it if we gave no 
indication of how it is possible to construct a table of T(n) for all values. 

What we may call the official procedure relies on the possibility of expressing the Beta 
function as a trigonometrical integral in accordance with (iii) of 6.07 in Vol. I; but it will 
suffice for our purpose if we take advantage of the method employed in 6.02 to exhibit the 
possibility of evaluating the normal integral and of obtaining the standard score which corresponds 
to the so-called probable error. 'To do this, we made use of the method of integration by series, 
a trick which is always justifiable if we can express the integrand as a convergent series, and always 
convenient if the series converges rapidly. It is not easy to fulfil the last condition but the 
following example will suffice to show that it is possible to evaluate I'(n) for fractional values of 
n other than n = 4. We first note that 


1 
B(n, n) = | =] — xy Ade 
0 


E A Ee TRA 
ee | x11 — n — lyx + 2 — = TE clear .)dx 
0 


1 E a 
=| maty — Ep + — Ego — an — lie... de 
0 

|= (n —WOxer +t (a= OF. (n — 1)@xn+8 | 
a ee aD Mari Dy Sie BY tod 


G — 10 
ue -> (= Dea : ; i i ; i ee 


Let us now suppose that we wish to find the value of I(%) or I(— 4) 


Sa ea La) 
Be, 3) s T(14) 
Since 
P04) = TG + 1) = WG) = 4V7; 
r$) = V3BG, Dr) ee OA) 
Likewise 


ra=- 4} =- HO- H, 
th Da BED GH) 
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In (i) we now insert the value n = 3, so that (n + r) = ¿(4r + 3), and 


(m1) =— 3; (w— 1) = Hi D= 3; 
1—19= -K-D 1 = PS 
and in general 
(n — 1) = (— 1.4” TT (4x — 3) = 189) 
For brevity we may write ne 
«ie 
rl (n +r) 
TI (sx —3) 
ES) 7 
"Thus (1) becomes 
BL D= 2 K, (i) 
"The series so defined is convergent. From (v) we obali 
LI o a = ae. 


After the first 5 terms the fall-off is very slow. The first 12 terms add up to 1:-64067. If 
we insert this value and that of Vr œ 1:7725 in (ii) we get I(3) =1:21 a result correct to 
2 significant figures only, the correct value to 5 significant figures being 1:2255. The sum of 
the first 36 terms yields the value 1:2162 which is correct to 3 significant figures. 


15.06 ScorE-SUM AND MEAN SCORE OF A SAMPLE FROM A GAMMA 
UNIVERSE 


We shall now proceed to establish what is the most important property of a Gamma universe 
in connexion with the theory of sample distributions. By a Gamma universe we imply a unit 
sample distribution specified by the Gamma variate whose p.d. equation is : 


pr z e kz qn—1 


a : 
‘The first zero moment and the mean moments are as derived from (vit) in 15.04, vrz. : 
wy = kon my = 3k-4n(n + 2) 
fits = RM m; = 4k-*n(5n + 6) 


ma =2kn m = 5kn(3n? + 26n + 24). 
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Whence we have 
x 8(5n + 6) 


n? 


PL = Bs (ii) 


10(13n + 12) 
y? 


e a 
Pean By = 15 + (iii) 


We now recall results obtained in 14.05 (viii)-(xi). For the Pearson coefficients of the t-fold 
sample score-sum we write accordingly : 


1 2. —3 
«Pr = zÊ: > (pP2 = 3 + a ; 
10(+ — 1 
Bs = mp, T Z bs; 
B ae a, 9) +48. — 19). + 10 — DB 


t? 
By substitution from (ii) and (iii) in the above, we get 


8(5nt + 6) 


nt? 


6 10(13nt + 12) 
mis HA (Ba = 1 a . . . 0) 


(tv) 


4 
Pr = a abs = 


In these expressions nt replaces n of (11) and (111) above. Thus they define the first 4 Pearson 
coefficients of a unit sample distribution defined by 


prt es su —1 


DG 


This result suggests the following important rule : 2f the unit sample distribution is that of a Gamma 
variate, we obtain the distribution of the score-sum of the t-fold sample by substitution of (tn) for 
n in the exponent of the expression definitive of the distribution of the unit sample. If we put k = 4 
and n = 4f in (i), we have k = 4 and nt = $ft in (vi). We may therefore express the preceding 
rule by the statement : if the unit sample distribution of a universe is a Chi-Square variate of f 
degrees of freedom, that of the t-fold sample score-sum (and hence mean) is a Chi-Square variate 
of ft degrees of freedom, and hence the sum of ¢ independent square normal standard scores 
is a Chi-Square variate of t degrees of freedom, 1.e. 


(vi) 


(By, 2735, 3-2) 


A... o 


The preceding derivation does not constitute a watertight proof of the rule, though it shows 
that (vi) is likely to give a very good fitting curve for the ¢-fold score-sum (S) distribution of a 
sample from the universe of which (i) defines that of the unit sample. We can establish in 
more than one way the conclusion that all the moments of the S-distribution and hence also 
the Pearson coefficients of any order are identical with those of (vi). 


662 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


We first suppose that A and B are Gamma variates with the same scalar constant R, 


defined by 


po m 
F(A) = Fo T and FL = a igs | ae 


<. (A) = k(n +r —1) and p(B) =k *(m +r — 1)”. 


For the moments of the score-sum distribution of independent samples whose distributions 
conform to the above we have y : 


z=Y 


p (A + B) = EA + BY = E "(2 H(A") . E(B™), 


0 lA +B) = Y Toy Hal) «A 8) 


t=f 


=p > T(n +x — DU (m +r — x — ">", 
x=0 


EE Pe) AN 
pun cata e ee 


o yl (r — x)! 


In the notation of Chapter 1 of Vol. I this expression involves a product of Figurates, viz. : 


mAr S T, En 
a=0 


Whence from (x) in 11.07 
pA +B) Sk". r Famn =k" at mpr 1)". 


The above expression defines the moments of the distribution of a score C(= A + B) defined 
by the Gamma variate 

krim 
I'(n + m) 


Thus the score-sum of a I(n) and of a T(m) variate is a ['(n + m) variate. Hence that of 
two I'(n) variates as defined by (1) is that of a IT (2n) variate, that of three T(n) variates is a [(3n) 
variate and so on. 

Alternatively, we may reach the same result by recourse to the moment generating function 
of the Gamma variate. If f(x) is the p.d. of a variate x whose range is from 0 to œ, the m.g.f. 
of the distribution is given by 


el ba 


F(C) = 


00 
Gp) = | e f(x). de, 
For the m.g.f. of the unit sample distribution defined by (1) above, we therefore have 
Gy) = ro | ¿008 yn- dy 
0 


igs ee 
Tm (k—a" 


a bop = (1 = an 
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The m.g.f. of the s-fold score-sum sample distribution in accordance with (iii) of 14.02 is therefore 


Cis (1 s 2) > 


We obtain the general expression for the moments of the distribution of the score-sum of 
the s-fold sample as in 14.04 by recourse to the relation 
wl G hen ¢ = 0 
wee su) when ¢ = 0. 


By successive differentiation we have 
as 
= Gu) ds (sn + “PR pe ps” (k es =, 


“pp =R (sn + r — 1)”. 


This defines the moments of the distribution in which sn replaces n of (i) above. Thus the score- 
sum of an s-fold sample from a T(n) universe is a I'(sn) variate. 


The so-called Chi-Square Test. It will forestall misunderstanding at a later stage if we take this 
opportunity to comment on the use of the expression Chi-Square test, recalling earlier remarks on the 
c-test. In different chapters of Vol. I we have referred to a c-test with judicious use of the indefinite 
article. A c-test is a test we can rightly apply to a score whose distribution is approximately normal ; 
but whether a score distribution approximates to the normal form and with what order of precision is a 
matter for separate enquiry with due regard to the nature of the score. The normal curve is what the 
Herbals describe as a protean genus which turns up in unlikely localities ; and the rationale for invoking 
it in one context, e.g. quality control, has no intrinsic relation to the reasor. for enlisting its aid in another, 
e.g. the proportionate score difference of two sub-samples. 

Similar remarks apply with equal force to what many current treatises refer to as the Chi-Square 
(x) test. The expression recalls a well-known comment on the Lord Privy Seal, a personage who is not 
necessarily a lord, never a privy and in no sense a seal. A species of the Pearson Type III genus turns 
up as a sampling distribution in diverse situations for reasons just as diverse ; and tables of the corre- 
sponding integral are in use to test many different hypotheses. ‘The statistic commonly referred 
to as Chi-Square is in fact a sum of squares, and is not itself a square, except when we speak as below 
of Chi-Square for 1 degree of freedom. 


* * * * * * * 


Pooling Data. The additive property of the Chi-Square variate set forth in this section 
calls for a caveat w.r.t. a recipe sometimes, and in the opinion of the writer wrongly, cited for 
situations in which the end in view is to assess the significance of an assemblage of normal tests 
the result of none of which is highly significant per se. We have seen that a normal test, i.e. 
a c-test, is on all fours with a Chi-Square test for 1 d.f. if we use the table of the latter to assess 
the probability of getting a score as great as c? = C; but we have to take something for granted 
when we do this. By squaring c we eliminate the sign difference. Hence assessing the prob- 
ability that the sum of a set of c-scores (e.g. di = 1:50,, da = 2:l0,, da = 1-903, etc.), each refer- 
able to a difference in the same direction though each or most of them numerically too low to 
inspire confidence, will attain its particular numerical value is not on all fours with assessing the 
probability that a sum of Chi-Square variates will have a particular numerical value. 

In such a situation, elementary considerations may supply the answer we seek. - For instance, 
we may suppose that we perform six experiments or make six sets of observations involving a 
difference, e.g. percentage of persons not attacked in an epidemic when treated in one or other 
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of two ways. ‘The result of all six observations might record a difference in favour of one treat- 
ment rather than the other, but no such difference might be large enough in comparison with its 
own estimated s.d. to constitute what we usually regard as a significant result. In this case we 
may argue as follows. To say that there is no true difference signifies that a difference in one 
direction or the other is equally likely on any one occasion. ‘Thus our null hypothesis assigns 
p = 4 as the probability of getting a difference whose sign is positive in a single trial. The 
likelihood of getting six out of six results of this sort is (3)*, 1.e. the adverse odds are 63 : 1. This 
is an exacting test to apply to the pooled data, and might fail to restore confidence. If so, we 
may proceed as follows. 

On the assumption that c, = (di —0,), Ca = (dz — c3), etc. are approximately distributed 
as normal variates of unit variance, their sum s = (c, + Cə . . . Cn) is itself a normal variate of 
variance n, so that c, = (s + V n) is itself a normal variate of unit variance. Since 
also the difference between two normal variates is a normal variate whose variance is the 
same as that of their sum, the validity of this procedure does not presuppose that every difference 
has the same sign. If our pool includes one or more negative values, the value of c, will of course 
be smaller than it could otherwise be; but this does not affect the rationale of the c-test. 
Evidently, this is not true of the sum of the square standard scores, i.e. the sum of Chi-Square, 
the value of which will be exactly the same if all the differences point in one direction and if 
equal numbers point one way or the other. 


Distribution of the Mean Score. If S in (vii) is the score-sum of the t-fold sample, the mean 
score (M) is given by S = tM. We may obtain the p.d. equation of the mean score by recourse 
to the substitution of Case I in 15.02, viz. : 


F(M) =f(S). = = tA) 


Thus (vit) becomes 


(viii) 
The last equation defines the distribution of the mean value of ¢ independent square scores. _ 


EXERCISE 15.06 


1. Examine the results of pooling the following data cited by Major Greenwood (Epidemics and 
Crowd Diseases) in connexion with the possibility that a summer attack of influenza conferred immunity 
during an autumn epidemic among schoolboys : (a) by pooling all the raw data ; (b) by pooling the critical 
ratios ; 


Eton. 393 attacked in summer, of these 29 (7-4 per cent.) attacked in autumn ; 360 not attacked 
in summer, of these 172 (47:8 per cent.) attacked in autumn. 


Harrow. 90 attacked in summer, of these 29 (32-0 per cent.) attacked in autumn ; 339 not attacked 
in summer, of these 258 (76-1 per cent.) attacked in autumn. 


Clifton. 162 attacked in summer, of these 22 (13-6 per cent.) attacked in autumn ; 289 not attacked 
in summer, of these 99 (34-3 per cent.) attacked in autumn. 


Haileybury. 180 attacked in summer, of these 41 (22°8 per cent.) attacked in autumn; 335 not 
attacked in summer, of these 73 (21-8 per cent.) attacked in autumn. 
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2. In the same way analyse the following data compiled by Greenwood, w.r.t. efficacy of prophy- 
lactic inoculation, from official returns of the Western Front 1914-18. 


Incidence Death-rate Case Mortality Number of Cases 
per 1000 per 1000 per 1000 ; 
Year -: Disease A | | 
Pro- Unpro- Pro- Unpro- Pro- Unpro- Pro- Unpro- 
tected tected tected tected tected tected tected tected 
Typhoid -= — — — 5:8 17-3 51 202 
1914 Para A — — — —— -— — a 5 
Para B — — —— a — 3-2 — 31 
Typhoid 0-93 8-1 0-007 1:8 7:5 23:2 517 288 
1915 Para A — + 04 o 0-003 — 0:7 =- 281 
Para B O A | — 0-03 — 1-9 — 1043 
Typhoid 0-57 0-51 0-009 | 0-04 1-58 8-33 693 36 
1916 Para A 0-21 3:19 0-003 0-05 1-56 1-78 256 224 
Para B 0-3 ; 82 0-002 0:07 0-82 0-77 362 647 
3 Typhoid 0-104 1-09 0-008 0:13 7:7 ein E 33 
1917 Para A 0-07 1-12 0-000 0-03 — 2-93 139 34 
1 Para B 0-18 4-14 0-003 0:13 1-7 3:20 346 t25 


3. From the same source we obtain the following figures for effects of inoculation against enteric 
fever of a regiment (17th Lancers) in Meerut : 


1907 1909 

No. inoculated ; A ; : . 430 460 
No. untreated ; ; : ; O 160 
Toal No : : i sE ADD 620 
Cases among inoculated . i ; í 13 18 
ditto untreated . TE ; 95 96 
Deaths among inoculated ; : ; 1 2 
ditto untreated . : 3 13 18 


15.07 THE INDEPENDENCE CONDITION 


We have assumed that an f-fold sample from a universe in the context of the foregoing theorem 
signifies the same thing as f independently selected score values. Our conclusion is therefore 
that the score-sum of f independent Chi-Square variates of 1 d.f. is a Chi-Square variate of f 
degrees of freedom. A question fundamental to the rationale of the significance tests we shall 
later examine is whether we can assume that f Chi-Square variates of 1 d.f. are independent if 
their score-sum is a Chi-Square variate of f degrees of freedom. When we say that R. A. Fisher 
gave the first rigorous proof of the so-called, Student distribution, this is the pivotal issue. For 
Gosset implicitly assumed that two variates of zero covariance are independent. We have seen 
that this is not so, though the converse is true, that independence implies zero covariance. To 
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say that two variates a and b are strictly independent implies that the covariance of any integral 
power of a and any integral power of b is zero, i.e. for all whole number values of n and m 


Coe (a", Dj = 0. 


Our models of 12.07 have shown that Cov (a, b) = 0 does not necessarily imply that this is 
so ; but common sense suffices to justify the conclusion that Cov (a”, b™) of 11.02, being an index 
of whether high values of a more often than otherwise correspond with high or low values of 
b, must have the same sign regardless of the numerical value of n or m, unless zero. Let us 
therefore examine the implications of deriving the result obtained in 15.04 by the method of 
moments without assuming independence. If we do assume independence we write 


E(a® . 52) = E(a®) . EÈ) = (a) - pub). 
Otherwise we must put 
Ela? UF *) = Coo (a? A) + Ela”). £0) 
= Cov (a”, b**) + pa}: ue- ab). 


If we do not assume their independence we must therefore write the moments of the dis- 
tribution of the score-sum of two variates a and b in the form : 


pla + b) = Ela + b} 


r=k 
=e E 
0 


x= 


a=k x=k 
ES >. RkisyHala) + Pe a(B) + > kia) Cov (a7, b*~*). 
a=0 z=0 


Hence, if Cov (a*, b*-") > 0 for some value of x, there must be some value of k such that 


x= k 
pla + b) > > Riz) pala) - Hr- a(b). 
z= 0 


Similarly, if Cov (a”, b*-*) < 0 for some value of x, there must be some value of k such that 


x=k e 
pala + 5) < Y konda 
z=0 


Hence it must always be true that Cov (a”, b"~*) = 0 if 


x= k 
pala + b) = = Rix + Hal) - por alD). 

If a and b are Chi-Square variates of m and n degrees of freedom, the expression on the right 
defines the moments of a Chi-Square distribution of (m + n) degrees of freedom. Hence the 
distribution of the sum of Chi-Square variates of m and n degrees of freedom respectively will 
be a Chi-Square variate of (m + n) degrees of freedom, if, and only if, a and b are independent, 
and we may extend this conclusion by iteration to the sum of any number of Chi-Square 
variates. 
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15.08 THE CHI-SQUARE TABLE 


If x is a Type III variate the probability (P,) that its value will lie in the range from 0 to a is 
given by 


kr | : 
| ee eel ae eee 
To), | 
The probability that x will be equal to or greater than a is given by 
a a 
on ca —kæ, n-i A ý S E 11 


When k = 4 and n = 4f we speak of the Type III variate as Chi-Square for f degrees of freedom, 
and Elderton's tables for a particular value of a and f cite the numerical value of the integral on 
the right of (11). 
To obtain numerical values of (1 — P,) in (ii) we may proceed to evaluate P, as follows : 
Bees thane 
PELEA, OA AL kee e — sel elos) 


pn+2 q an+l 


2! 


| ke f a ane = s (an (kx)n+2 = Caan I 
ah ey) I(n)L n n+l 2!n+2) 3(n+3)* ` 


Sees —1 1 


À 0 
This expression vanishes at the lower limit, and we may simplify it by putting b = ka, so that 
the right hand side becomes 


b” E b 4 p" E b* b5 etc. | 
I(n\in n+1 ; 


21(n + 2) es ES e.» 


e pst 1— nb) _ b(3n +9 — nb — 2b)  b%5n + 25 — mb — 4b) ] 
ML (+1 — =. 5n + 5)® ++ | 


For the Chi-Square variate f = 2n and k = 4 so that a = 2b, the degrees of freedom being f, 
the above becomes 


EAS E E E 
“=Na) TEA EFFI" 2 EJE 3)? 
a(3f + 25 — taf — 2a) \ Se 
IET TEO . ete. (iii) 
When f = 4 (Chi-Square for 4 d.f.) : 
E —a a(15S-—2a)  a'(35— 3a) } 
a see ee ee ee ee 


If a = 3 in the above 

P= a0 ae + gage I. 
The series involved converges rapidly, and if we take the first 3 terms only we get P, = 0-4385, 
so that (1 — P,) in (ii) has the value 0-5615. On taking 4 terms we get P, = 0:4421 and 
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(1 — P,) =0-5579. The table gives 0-557825. The error involved in rejecting all terms after 
the first 3 is in this case 0-0037 for 0-5578, i.e. less than 1 per cent. 

To use the Chi-Square tables intelligently, and especially in connexion with confidence 
ranges of variance estimates (vide 16.04), it is important to recognise the implications of the fact 
that the range of the Chi-Square variates is wholly positive. The expected, i.e. mean, value is 
given by (vii) of 15.04 by substitution of k = 4 and f = 2n, viz.: 


a SRO =f . f AMO 


Thus the mean value of the Chi-Square variate for f degrees of freedom is f. It may, of 
course, happen that an observed value (a) of a score distributed like the Chi-Square variate is 
less than f. We are then interested to know the probability of getting a score value as small as 
or smaller than the observed one. If so, the question we are asking is what fraction of the total 
area of the curve lies within the range from 0 to a, i.e. the value of P, in (i). Since the table 
cites the value of 1 — P,, we therefore subtract the tabular value of (1 — P,) from unity to 
obtain P,. 

In the numerical example last cited f = 4, hence a = 3 is less than the expected value, and 
P, is the required probability. That is to say, we look up (1 — P,) = 0:557825 and derive 
P, = 0:442175. 

One case calls for special comment. The Chi-Square variate (C) for 1 d.f. is the square 
normally distributed score (c?) of unit variance. In this case the table entry for the mean value 
of Chi-Square defines the probability that the standard score will be as great as unity, i.e. that it 
will lie outside the range +ø. Similarly, the table entry against a value of Chi-Square equal to 
4 defines the probability of getting a standard score outside the range + 2c. In other words, 
the values of (1 — P,) given by the table of Chi-Square for 1 d.f. refer to the modular as opposed 
to the vector likelihood (see Chapter 5, Vol. I) of the relevant occurrence. ‘This is what we want 
to know if the question takes the form: is B different from A? At least as often the question 
which concerns us most is whether B (the larger value) is really greater than A. If so, the sign 
of the difference is material and the appropriate likelihood is $(1 — P,), i.e. half the table entry 
for the appropriate value of Chi-Square. 


CHAPTER 16 


SIGNIFICANCE TESTS FOR ANALYSIS OF 
VARIANCE 


16.01 THE VARIANCE PROBLEM 


In Chapter 13 we explored the possible breakdown of a single sample w.r.t. particular criteria of 
classification into subsamples with a view to deciding whether variation inter se is consistent with 
the possibility of each being a set from one and the same universe. Our enquiry led us to 
formulate different estimates of the variance of the score distribution in the putative common 
universe ; and currently prescribed assessment of the credentials of the null hypothesis 
depends on the consistency of such estimates. So far we have merely asked the question: 
what statistics of such a set-up must be consistent? We have now to define the criteria of 
consistency in such a context more explicitly. 

To say that two estimates are consistent in the most literal sense of the term means that their 
difference is zero or that their ratio is unity. When a null hypothesis postulates consistency of 
two estimates, the hitherto customary test procedure deems agreement to be satisfactory, if 
the deviation of the observed difference or the observed ratio from its mean value prescribed 
by the hypothesis is not excessive. This presupposes that we can define the distribution of 
one or the other. ‘That of the ratio has an advantage by no means obvious at first sight. 
Since the class of significance tests we are about to examine relies on the distribution of ratios, 
a brief digression to explain it will be necessary. 

Let us suppose that A and B are two variates whose distribution involves an unknown 
parameter (K), e.g. the true variance in the case of a normal universe of which our only knowledge 
comes from two samples. It may then be possible to specify exactly the distribution of K. A 
and K.B. If the distribution of the difference (A — B) is expressible in terms of the distribu- 
tions of A and B, we can of course define the distribution of (KA — KB) = K(A — B); but 
we cannot define the distribution of (A — B) in numerical terms unless we know the value of 
K. On the other hand, it may happen that we can derive the distribution of the ratio of K. A 
to K.B; and if so, we can define the distribution of the ratio (4 + B) = (K. A —K. B). 
Thus the derivation of the explicit distribution of a ratio may be possible in the absence of 
information necessary for defining the precise distribution of a difference. 

Such is the situation we have to deal with when our concern is the consistency of statistics 
involving sums of normally distributed square scores. We have elsewhere (14.02, 15.02) seen 
that we can define the distribution of the sum of normally distributed square scores of zero 
mean and unit variance, i.e. the ratio of square score deviations from an unknown true mean to 
an unknown true variance of the parent universe. For the reason stated, the fact that we do 
not know the true variance of the universe distribution gives rise to no difficulty if the distribution 
of their ratio rather than the distribution of their difference is what concerns us. The fact that 
we do not need to know the true mean will appear from considerations which follow. 

Suppose x, 1s any unit score from a normal universe, that its expected value is M and that 
the variance of the unit sample distribution is c?. We then define a square standard score as 

x, — MF 


C 
Oo 
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The sum of such scores is 


La 


r=1 


If each score x, is referable to an independent unit sample, we have seen (15.07) that the dis- 
tribution of such a sum is a Chi-Square variate with n degrees of freedom, 1.e. that its p.d. 


equation is 


eee 


10 


f A E (iii) 

(S) =—.e *.,S"- i ; : 05) 
L'(3n) 

In (1) M is an unknown parameter of the parent universe. In practice, we have only an estimate 

of it, i.e. the sample mean M,,; and the sum of square deviations based thereon is 


T(x, — Mo nV, ; 
5 = ( Far Es Mo)" Sean = A 5 ~ s : (iv) 
r=1 


The expression on the left is not stated in terms of standard scores, but we can define it as. 


follows : 

Sip tea or Se ae 

Th pæ p=” r=1” 
TEN yd 

= — —2M.M, + M? 

r=1 n 
r=n y2 
= È < — M? + (Mż — 2M. M, + M?) 
r=1 


=F + (M. — My, 


; = CF — E > ie, Ne My = n(M, — My, 


gs g* g? 
Hoki > (x, — My n(M, — M} 
eaea (v) 


Now the mean of the n-fold sample from a normal universe is itself a normal variate, and the 
variance of its distribution is 


o? 
on 
Whence we can write (v) in the form 
nVa "lx, —M) (MM, =MI E 
a 4 a ee gee . . . . (vi) 


In this expression the first term on the right is a Chi-Square variate of n degrees of freedom. 
Since the mean score of a random sample from a normal universe is also a normal variate, the 
second term is a square normal score of unit variance and is therefore a Chi-Square variate of 
1 d.f. Thus we have expressed in (vi) the sum of n square deviations from the sample mean 
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as the difference between two Chi-Square variates. Thus the fact that we do not know the value 
of the true mean M need not trouble us, if we can formally define the distribution of the difference 
of two such Chi-Square variates. 'To do so, it is customary to rely on mathematical techniques 
which involve an understanding of matrix algebra and the manipulation of multiple integrals ; but 
it is possible to exhibit the argument at a more elementary level, if our approach is heuristic. 
We first explore the result of expressing (vi) in terms of normal scores of unit variance such 
as (i) above. We then put 


x, — M 


Cy = —— and x= M+ o.c,, 
[0 
Enh r= 
Es a y tM o.c 
r=1 r=1 
r=n 
nM, — M)j= > c,, 
r=1 
nM — MF z Cr |” 
cies o” E een 


Hence we may write (v) in the form 

n. V, er e Ls : ss 
- HA : i i i . (vii) 

The two Chi-Square variates on the right of (vi) and (vii) are not independent, since the mean 


(M,) and the sum of squares are necessarily correlated; but it will be possible to express 
the statistic on the left as a Chi-Square variate of (n — 1) degrees of freedom, if we can define a 


set of n independent normal scores (t, Ug . . . Un) of zero mean and unit variance such that 
= C 
= ; : í ‘ i . (viii) 
VR 
AFB eae te... ; st 
n. LE r=n 
= (u +u +u... u2) — u? = 2 ‘ ee 


From (ix) and (ii), we see that 
r=n r=n 
4 eS 
pos] r=1 


The sum of the squares of all the u-scores is thus a Chi-Square variate of n degrees of freedom, 
i.e. that of the sum of n independent Chi-Square variates of 1 d.f.; and each square u-score 
of the sum on the right of the above is by definition a Chi-Square variate of 1 d.f. Now we 
have seen (15.07) that the sum of n Chi-Square variates of 1 d.f. is a Chi-Square variate of n d.f. 
only if the former are independent. Hence the sum of (n — 1) square u-scores on the right 
of (x) is a Chi-Square of (n — 1) degrees of freedom, if the assumption implicit in (viii) and 
(ix) is admissible, z.e. that (viii) and (ix) are consistent with the postulate that each u-score is 
a normal score of unit variance. We get a clue to the justificaiton of the postulate if we write 
(vili) in full as: 


SETI ERRE Pa 0 
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Let us consider the pattern 
Rego, Be ie. oe = ee © mae. Pee : ; < ANI) 
In the above, our c-scores are of unit variance and are independent. The variance of the dis- 
tribution of the score u, thus follows from (iii) in 14.01, vez. : 
V(u,) = Æ + BB+... NF. 
Hence u, is a score of unit variance if 
ABBA oy Nes : , (xii) 


If c, is a normal score of unit variance, A, . €, is a normal score of variance Af. Hence u, is the 
sum of n independent normal scores and is therefore itself a normal variate. ‘Thus each of 
our u-scores is a normal score of unit variance if (x111) holds good. 

We have now only to show that the condition defined by (ix) is consistent with (xiii). The 
next step will be easier to generalise, if we illustrate it by the case of n = 3, so that 


t= Ay. +B,. ts + Oy. cs; 
te = Ag FB e + Cg. C55 
te Ay A Bete + Cy ty 
In this case 
u + ug + u? = (Ay + AZ + ADG + (By + Bj + Bio + (Ci + Ce + Cole 
+ 2(A,B, + A,B, + A3Bs)eyc, + 2(A,C, + AC, + AsCs)eres 
+ 2(B,C, + BoC, + B3C3)C2C3. 
To ensure that (ix) holds good we must simultaneously make 
(Aj + AZ + As) = (Bi + Be + Bs) = (CE +Ce+Cs)=1 .  -. (xv) 
(A,B, + A,B, + AzBs) = (A,C, + 42C2 + AzCz) = (B,C, + B.C, + BsC3)=0. (xv) 
In accordance with (xiii) to ensure that our u-scores are normal scores of unit variance, we must 
also define them so that 
(At + B? + C2) = (4 + BE+ C) = (4+ BHOI . . (av) 
In accordance with (viii), we have already defined 


1 
a eS Cs ; ? ; . (xvii) 
Thus three conditions only are sufficient to ensure that an n-fold set of u-scores defined by 
(xii) simultaneously satisfy (viii) and (ix), hence also (xii) : 
(a) each of the constants of one of the u-scores defined by (x) must be equal to 1? ; 
(b) the sum of the squares of the constants in each row (A?, BY, etc.) and in each column 
(AGA AS, BEBÉ... Bi, ete.) must ake be unit : 
(c) the total sum of cross products of corresponding constants (4¿B,, etc.) in any two columns 


must be zero. 


If we can choose the constants A,, B,, etc. in (xii) to satisfy these three conditions simultan- 
eously, we can say that the n-fold sum of the square u-scores is equal to the n-fold sum of the 
square c-scores, i.e. that its distribution is that of a Chi-Square variate of n degrees of freedom ; 
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and if it is admissible that the sum of n Chi-Square variates of 1 d.f. is a Chi-Square variate of n 
degrees of freedom only if they are independent, we must conclude that our u-scores are inde- 
pendent variates. If so, the distribution of n — 1 of our u-scores is a Chi-Square variate of 
(x — 1) degrees of freedom, and the statistic defined by the left of (x) is itself a Chi-Square 
variate of (n — 1) degrees of freedom. 

What we have still to ask therefore is whether we can indeed choose 4,, B,, etc. to satisfy 
(a)-(c) simultaneously. The student who is familiar with the elementary theory of equations will 
realise that we have indeed at our disposal the requisite number of conditions to fix the con- 
stants other than A,, B,, etc. = n~*. Others may more easily grasp that this is so, if we illustrate 
the possibility of satisfying the prescribed conditions by numerical examples as in 16.02 below. 
What numerical values of the constants other than A,, B,, Ci . . . M, satisfy the conditions 
prescribed are, of course, immaterial to our purpose except to illustrate that a solution is possible. 
In choosing them, it is important to remember that no constant can be numerically greater than 
unity; but the sign need not be positive. Indeed, no solution would be possible on that 
understanding. ‘This is consistent with our statistical requirements because the difference 
between two normal variates is a normal variate whose variance is equal to that of their sum, 
i.e. (4,c, — B,c,) is a normal variate of variance (A? + B?). 

Before proceeding we may make explicit the outcome of the foregoing reasoning. Our 
concern has been to define the distribution of the -fold sum (O) of standardised square deviations 
from the sample mean, i.e. 

EA ==" (1, = MF 
fsa ee o? 
So defined O is a Chi-Square variate of f = (n — 1) degrees of freedom, i.e. 
QT pié» 


A... ee 


The statistic whose mean value is the true variance of the score distribution in the parent universe 
is as defined in 13.02: 


. (xviii) 


1 r=0n 
E aoe 
E sal cael 
eee F, 
os er ata 


We may speak of this statistic as the unbiased estimate of the variance in standard form. By 
definition, therefore, 


Whence we may write in accordance with Case I of 15.02 
pS.) =f. F(Q); 

a e TI Se, SHI-2) 

po EA (xx) 


The last equation has the same form as (viii) in 15.05; but f = (n — 1) replaces n in the 
latter which describes the distribution of the mean value of the sum of n square deviations of 
unit variance from the true mean. 


PS.) = 
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EXERCISE 16.01 


Note.—The following score transformations are not orthogonal, but may help the student to materialise 
some implications of such a change from one score system to another. 


1. For the Lottery Model of 11.08 (Fig. 91) evaluate the sampling distribution of ta = x, + 3x3; 
Uy = 3X, — Xo. 


2. If x, and x, are the results of the first and second toss of the double spin of a coin, determine the 
sampling distribution of S, = u2 + u%, if u, and u, have the same meaning as in 1. 


3. For the 3-fold spin of a tetrahedral die with faces having 1, 2, 2, 3 pips, the first, second and 
third unit scores being x,, %2, Xz, we may define a score system 


Un = % + 3x2 + 203 3 Uy = Bx, — Xa + 2X3; Us = 2%, — Xo — Hgo 


Determine the distribution of the sample score S, = u? — uz + 2u?. 


16.02 THE ORTHOGONAL TRANSFORMATION 


Without invoking any considerations other than those dictated by the statistical require- 
ments of the problem, we have developed in 16.01 a score transformation suggested by the 
properties of the Orthogonal Lottery Model (Fig. 97). The rationale of many significance tests 
devised during the last three decades invokes such score transformations the practicability and 
numerical meaning of which are easily demonstrable without recourse to higher mathematics. 
One step in the argument calls for clarification. The reader who is not familiar with the theory 
of equations will want assurance. We shall need n? constants A,, B,, etc. to satisfy (ix); and 
of these n must have the value A, = n™> = B,, etc. to satisfy (viii) and (xiii). Can we choose 
the remaining n(n — 1) constants to satisfy (xiv)-(xvi)? Such is the theme of this section ; 
but it may be helpful to some readers who have unsuccessfully tackled a more advanced treatment 
if we first clarify the historical background of the test procedures dealt with below. 

The customary approach to the score transformation outlined in 16.01 is intelligible in the 
context of R. A. Fisher’s earliest work, published when the impact of the theory of relativity on 
experimental physics had lately provoked interest in the abstract geometry of the hypersphere. 
In this setting, there were new clues for the mathematician and new difficulties for the practical 
statistician unfamiliar with the new geometries. For at least two decades very few among those 
who espoused the techniques of Fisher were indeed equipped to evaluate their mathematical 
credentials. Fortunately, familiarity with the mathematical tools which Fisher’s school relied 
on is not essential to an understanding of the outcome, and the reader who skips the ensuing 
brief digression will not be at a disadvantage.* 

*kkk The Geometrical Analogy. The pattern for the transformation of the 2-fold sample 
of unit scores is 


Yı = Qy .X1 + ayy. Xo and xt + ag = y1 +52. 

Vo = Ag . Xy + Aga. Xp, 
This will be a familiar lay-out to the student who has gone far in co-ordinate geometry, being 
the usual jumping-off ground for an introduction to matrix algebra; but its interpretation 


should in any case offer no difficulty. Let us suppose that P is a point in a plane whose co- 
ordinates with respect to one Cartesian grid are x, and x,. If 7 is its distance (the radius vector 


* Four stars mark both the beginning and the end of the passage referred to. 
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of the point) from the origin of the grid, the theorem of Pythagoras prescribes that r? = xj + x9. 
Let us now visualise the same point in a second grid with the same origin and hence the same 
radius vector. If its co-ordinates in the latter are y, and yə we may also write 7? = y} + 4%. 
Hence x? + x2 = y? + y2. The condition («? + x2) = (y? + y2) thus suffices to justify the 
geometrical interpretation of two linear equations involving 2 independent variables as a rotation 
of axes. If ais the angle the second grid makes with the first, elementary trigonometry (Fig. 117) 
suffices to show that 
= CORA. == SIMA. Xa ; 


Ye SIA de 008.4: Xo. 
If we now write a,, = cos a = — az and a, = sin a = Ay, the following identities follow : 
2 E OS 2 2 A, ee 2 : 
aj + a = 1 = az, + az and ají + aj = 1 = ayy + 032 . . (i) 
(since sin? a + cos? a = 1). 


Ay,» A + Ag A29 = Ô = Ay. Ag + Aye - da - . . (11) 


ROTATION OF AXES IN A PLANE 
P 


AR =r? 
x, x;rcos b 


x,=rsinb 
x; 


y+ yer” 
y,=rcos(a+b) r 
y,=r sin(a+b) 


Since cos(a+b)=cos a.cos b-sin a.sinb and sin(a+b)=sin a.cos b+cos a.sin b 
y, =r cos a.cosb -r sina.sinb = cosa.x,-sina. x2 
y,=r sina.cos b + r cos asinb = sina. x, + cos a.x, 


Fic. 117. Geometrical meaning of the Orthogonal Transformation. 
a 
V2 


1 
If a = 45°, cosa = —= = sina and y, = —=% + 


1 
V2 Af 2 


Xas Ya = Ta s Sar 
If we interpret our score transformation in terms of the rotation of axes, we thus see that : 
(a) each horizontal and each vertical sum of the squares of the coefficients is equal to unity ; 
(b) the sum of all the vertical cross products and of all the horizontal cross products is zero. 
The reader who has an elementary knowledge of co-ordinate geometry in 3-dimensions will 
be able to take the argument a step further. If P is a point in 3-dimensional space with co- 
ordinates x,, X» %3, the equation which defines its radius vector (r) is 3 -+ x3 + x$ = 72. If, 
therefore, Yı, Y», Yz are its co-ordinates in another framework with the same origin, 


Vi +yz +93 = T = at ag + me 


676 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 
We may then specify the relation between the co-ordinates in the form : 

Yı = Ay. Xı F Ayn. Xa F ig. X3 ; 

Vo = Ag, Xy F Aag. Xo + a23 + Xs; 

Ys = lı %y F Aza. Xo F Ugg. Xg 


By elementary trigonometry, we may derive relations similar to those of (i) and (ii) above. 

-In 11.07 we have metaphorically spoken of super-solid figurate numbers, the build-up of 
which follows the same lines as the corresponding picturable number scores for 0, 1, 2 and 3 
dimensions. In the same way, there is no objection to the use of the term radius vector to define 
a sum of squares such as 7? = (x? + x} + x? + xj), and we may speak metaphorically of x, x», 
etc. as co-ordinates of a point in a 4-dimensional ultra-visual grid. If 7? = (y? + y¿ + y3 + 97) 
we may likewise speak of y,, Ya etc. as co-ordinates of the same point in a 4-dimensional grid 
with the same origin, and define a 4-fold system of linear equations descriptive of rotation of 
axes in a so-called 4-dimensional space. Actually, we are here using the idiom of geometry to 
describe algebraic manipulations which we can no longer picture. Whether it is helpful or 
otherwise to do so depends entirely on whether it is a familiar idiom. If so, an excursion into 
the hypersphere may help us to discover new or to interpret known relations. Otherwise, the 
safe course is to keep our feet on the solid ground of algebra, as we shall now do. In any case, 
we have to rely on matrix algebra to generalise the rules which we shall now examine in greater 
detail. 

Rules of the Transformation. Asa general pattern of the so-called orthogonal transformation, 
it will suffice to lay out a 4-fold set of duplicate scores u, and x,: 


4, = Ax, + Byx, + Cx, + Dixa; 

Us = Axa + Box, + Cox, + Doxa; 

Us = Agx, + Bax, + Cat, + Doxa; 

Uy = Aye, + Bax, + Caro + Dara. 
We speak of this system as an orthogonal transformation if 

ue tue tk ek = 2 + x? + XG. 

If this relation holds good, 4 rules subsume the relations between the linear constants A,, B,, etc. 
They are as follows : 


Rule of Column Squares. The sum of the squares of the constants associated with each 
x-score is unity, which we write for x, when there are 4 independent variables as 


r=4 
a Ne eek. 
p=l 
Rule of Column Cross Products. The sum of the products of the constants associated with 
any single pair of x-scores in the same row is zero, which we may write for x, and x when there 


are 4 independent variables as 
4 


> NA 
r=1 
Rule of Row Squares. The sum of the squares of the linear constants definitive of a single 
u-score is unity, i.e. for u, of a 4-fold set 


AFB CP aa Dare | 
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Rule of Row Cross Products. The sum of the products of all pairs of constants definitive of 
a single pair of u-scores is zero, i.e. for the 4-fold set 
A E ae T D,D, = 0. 


For a system of more than 3 equations the derivation of the rules is very laborious without 
recourse to determinants ; but the student can easily verify them for the 3-fold set if we here 
give an elementary demonstration for the simplest case, wz. : 


u, = Aix, + Bix; Us = Apt, + Box, and ut + uf = x2 + x. 
The rules then take the form : 
Column Squares and Cross Products. 
A? + A2=—1=B?+ Be and A,B, + A,B, = 0. 
Row Squares and Row Cross Products. 
A2 + B= 1=A+ Be and A,A,+ B,B,=0. 
The derivation is as follows. If u? + uf = x? + x3, 
(A? + ADa? + (B2 + BDa? + 2(A,B, + A2Bo)to Xp = x5 + %. 
Hence by equating coefficients we derive the column rules of (1) above : 
A, ag l= Ba; 
A,B, + A.B, = 0. 
We now express each x as the dependent variable by solving the foregoing equations. ‘Thus: 
Ay, = AA Xa + ABI% ; 
Ayu, = AA Xa + AB 2%, 
A» A, 
Similarly 
| B, b, 
If (A,B, — A,B,)-1 = D, we then have 
Ng Be A ES DO e 
x, = —A,.D.u,+A,.D.4u, 
Whence the relation (x? + a%) = (uj + uz) means that 
(43 + BID? . u? + (A? + BID? . u3 — 2(A,A, + B,B,)D®. ttig = 1 + u, 
Ae + Be = Di = Al + Bi; 
A,A, + BB, = 9. 
The last equation corresponds to the rule of row cross products. We obtain the rule of row 
squares as follows : 
Since (A? + 4?) = 1 = (B? + BR), 
2D? = (Aj + Az + Bi + B?) = 2. 
Whence D = 1 and | 
A? + B? = 1 = A? + Bi. 
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The rule of row cross-products has a special statistical meaning, because it defines a necessary condition 
of the statistical independence of the u-scores. ‘The reader who is familiar with determinants may 
generalise the argument which the following illustrates. We suppose that we have a 3-fold sample, 
and postulate 


w= A, Gt oe, Cy 
u, = Ay. 13H Baot + Es &. 
The cross-product rule for the rows is 
| A,A, + BB; + CC, =0; 
A, As + B,B; + C¡Cz =0; 
A,A; + BB} + CC; = 0. 
If our u-scores are independent, their covariance is zero. Now both u-scores and c-scores are 
scores with zero mean value, so that 
Cov (ty, Ua) = E(u, . uz) and Cov (c,, c2) = E(c, . c2), etc., 
~. Cov (4, Ua) = E(Ayey + Bic + Cyes)(Asey + Baco + Cols) 
= A,4) . E(cj) + B,B, . E(c3) + CC . E(cg) + (AB: + AzBi) . E(G . Ca) 
+ (4,C, + 42C) . Elc, . c3) + (B,Co + BaC;) . Elcs . c3). 


In these expressions E(c?) = 1, because the c-scores are scores of unit variances and E(c, . E) =0 in 
virtue of their independence, so that 


A, As 4. B5; = BE Be == pr 
Numerical Illustrations of the u-score transformation. In 16.01 we defined 3 conditions as 
sufficient to ensure that the sum of the squares of an n-fold set of u-scores each defined in 


terms of n unit sample standard scores (c,) by (xii) is a Chi-Square variate of n degrees of 
freedom. The general equation for the eu-score is 


u, = A,c, + Brey + Cr. cy... etc. 
The three conditions are : 
(i) each of the constants A, B,, C,, etc. referable to the specification of u has the value 1? ; 


(ii) the sum of the squares of the constants in each row (47, By, C;, etc.) and of the 
constants in each column (e.g. A2, 43, A}... Az) is unity ; 


(iii) the total sum of the cross product of corresponding constants in any two columns 
(A,B,, AB», etc.) must be zero. 


We shall now illustrate the possibility of choosing the constants accordingly by recourse 
to numerical examples. 


Case I. Two Variables. Our equations are 
Uy = Ay. + By. c2; 
is <= An. + ae 
To make the coefficients of u, equal to n-?, in which event Aj + Bj = 1, we put 
oo eae ats ae ates 


the = As, FB. Er 
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To make 42 + B2 = 1 and A?4+ 4} = 1 = Bi + By A, = + (4)? and B, = +(3); and 
the column cross products will vanish if the signs are opposite, when row cross products also 
vanish. Thus alternative solutions are 


Case II. Three Variables. We define the constants of the first row to satisfy Rule (1) 
above, and fix one constant in each remaining row and one other in each column to satisfy Rule 
(ii). Our equations are then 

+ Mog a F e aa ae ; 
u, = A,.¢, + Ba. ca + (1 — 4 — Bbc; ; 
u= + (% — Aia + (3 — Be + (4h + Be — iie 
We may satisfy the condition that the cross products of the first two and of the first and the third 


column vanish by evaluating B,, having fixed A, arbitrarily. We shall put A, = 0, so that the 
system of constants becomes 


+ ($)! + ($)! G 
0 B, + (1 — B2} 
®t +@-B)  +(B8-p 
The cross products of the first and second columns vanish when 
GIE- BH = —4 


2. Bg tb (2). 
We have now the set 
AE 
0 + (3) + (2): 
+ (3) + (5) + (5) 
It remains to choose the signs so that all the cross products of the columns vanish, viz. : 


O + 


0 ee + (2); 
@ -@ -o 
Our final set of equations is then 
uy = mE E T = Wee ; 
1 1 
Us = Aa ag Wie 


Again the cross products of any two rows vanish, 


(A,A, + BB, + C\C.) = 0 = (4,4; + BBs + C,C;) = (4243 + BB; + C:s) 
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Another solution which likewise satisfies the prescribed conditions is 
+ (3) + tay + (3) 
+O -Œ 0 
tO +0 A 
Case III. Four elo In this case n~* = 4, and we may write down our system of 
constants as 


1 1 1 1 

2 2 2 2 

A, B, C, = (1 > A? T Be wks ES 

A; B; fd +(e N 
+($—A3—A})t +(2—Bi—Bzt  +(4-CG-C3*  +(43+45+B34B34C5+C5—4)'. 


We may now fix any 3 constants in the first 2 columns without prejudice to the condition 
that cross products with other columns must vanish. We shall write A, = 0 = B, = By; 
and our system takes the form 


A. Je L 1 

2 2 2 2 

0 0 oe (1 — C2)? 
A; 0 Cs (1-— AE 


4-4 +40 =4-CG-CY CA 
To satisfy the condition that cross products of the first two columns vanish 
- 4) =(— 4 
A; = + cie 
The sign of A, is immaterial but (4 — 43)* and (#)* in the bottom row must have opposite signs, 
and we may put ~ 


1 Ae p 1 

2 2 2 2 

0 0 ap (1 — C2} 
EO 0 Cs G— Ca 
HEN. -0 -4-a- AAA 


We now fix C, and C, so that the cross products of the first and third and of the second and third 
columns vanish. By taking the third with the second we get 


il — Cz — C3) = Y6 
~ G-A- = + (dp)! 
Without prejudice we may fix the es of the above, and obtain from the first and third columns 
$t e + 
CE = ee 
Whence from the foregoing expression for (# — CÊ — C$} 
13 —C3=7_g and C,= 4 (3). 
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We thus have the system 


3 3 3 3 

0 0 + (3) + (3) 
(3) 0 me FaF 
(15) aS Ya (is) + (15). 


We can now fix the signs to ensure that the cross products of the first and last and the second and 
last columns vanish, and the only arrangement which then makes the cross products of the third 
and last vanish also is 


0 0 ae -H 
@) 0 -@ -@ 
Gt ae (dy)! 15) 


Again the cross product sum of any two rows vanishes in accordance with zero covariance. 
Another solution which is likewise consistent with the prescribed conditions is 


SE E d Al i 
w= +32. +33 Ta 
Us = se — ke 

2 e 

mot t AN 

uae Vas 24 


1 1 
u, =4ce, +4ce. —$cs — gce 


16.03 CONFIDENCE LIMITS OF VARIANCE ESTIMATES 


In Chapter 13 two issues involving significance of variance estimates emerged. One dealt 
with in 16.07 below involves the consistency of two such estimates, when the end in view is 
to test the null hypothesis that the universe of choice is homogeneous w.r.t. the criteria of 
classification. The other arises in connexion with the construction of a balance sheet exhibiting 
components of variance, if we reject the null hypothesis. We may then wish to set limits of 
confidence to the items in our balance sheet. 

If we can assume that the distribution of all our score components is approximately normal, 
we can regard each estimate (V,) as a Gamma variate, which we can express in Chi-Square form 
by the substitution 


ti Fa Hd | : 
S =>. e 4 = ce : ; 0 


oO 


As shown in 16.01, the distribution of S is that of Chi-Square for f degrees of freedom if 
f= (n — 1) for an n-fold set of normally distributed score values. In (1) above V, is the mean 
square deviation from the sample mean. Alternatively, we may express S in terms of the un. 
biased estimate (s?) of the sample variance by recourse to the identity 


ee ee en oe 0) 
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By recourse to the table of the Chi-Square integral for the appropriate value of f, we can 
use (11) to define confidence limits of the true variance in terms of the unbiased estimate. The 
procedure is as follows. Let us denote by S = a and S = b respectively the values of the Chi- 
Square integral which makes the total area under the curve equal to P, and P,, i.e. 


‘(SWS P, aa | SR 
| | 


a [Aus = Py — P, Me es Se ot 


In this expression (P, — P,) defines the probability that the values of S will not be outside the 
range from a to b. Alternatively, we may say that a and b respectively define lower and upper 
limits to a range of S (Chi-Square) values whose net expectation is P, — P,. From tables of 
the Chi-Square integral we may at once evaluate (iii) for the assigned values of S definitive of 
a range whose expectation is 95 per cent., as when P, = 0-025 and P, = 0-975. The tables of 
Chi-Square for 9 d.f. cite a score value a = 2-7 for P, œ 0-025 and a score value b = 19 for 
P,=0:975. ‘Thus 27 and 19 define a range of S values whose total expectation is approxi- 
mately 95 per cent. and the odds are therefore 20 : 1 that a value of S will neither exceed 19 nor 
fall short of 2-7. 

From (ii) above, the assertion that S lies between the limits a and b inclusive is equivalent 
to the assertion that the true variance o? lies in the range from 


paced, LT 


a =$ “to ==. f ; - a1. 
a a b b ( ) 
Both assertions are true of 95 per cent. of samples, if we choose appropriate values of aand b as 
indicated above. ‘Thus our risk of error will be 5 per cent. if we consistently apply the rule 
implicit in (1v) to assert a range of values within which o? lies. For example, we may suppose 
that our unbiased estimate of variance (s?) referable to a 10-fold sample is 12, in which case 


f =9 as before. We then have 
of ==—(12)=40 and 0 2 (12) ~ 5:68. 


We are thus entitled to say that the odds are about 20:1 that we shall not err in asserting 
the true value of the variance of which our unbiased estimate is 12 to be within the range 
5:68 — 40. 

The asymmetry of this result arises from the skewness of the Chi-Square distribution. 
The foregoing argument involves the assumption that our sample variances are referable to 
normally distributed score values. ‘This may or may not be a legitimate assumption, if we base 
our balance sheet on the postulates of Model IT in 13.04 but it cannot be legitimate if we adopt 
the Model I viewpoint. 


16.04 DEGREES OF FREEDOM FOR VARIANCE EsTIMATES 


In Chapter 13 we have derived unbiased estimates of the variance of the rc-fold sample 
score distribution of a homogeneous normal universe classified w.r.t. two criteria and of the 
nrc-fold sample score distribution referable to three criteria of classification. We shall now 
see that the corresponding standardised sums of square deviations are in each case Chi-Square 
variates, and the divisor of each sum of squares is the appropriate number of degrees of freedom. 
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It will suffice to consider the statistics s2 and s2 for a set-up involving 2 criteria of classification 
as in 13.03 and 13.04. If M, is a column mean and M, is the grand mean of the sample, we 
define s; by the relation 


aS: (M; — M,)? 
i=1 


a AA A 


If o? is the true variance of the score distribution, that of the mean of the r-fold column sample is 
given by 


2 
“ee z 
0% ile i : x : : å (11) 
Whence we have | 
5 (M i — M T 
Se A aaa 
"9 = << e . . . . . (111) 


We shall write S; = (c — 1)s%, so that 
S; SM; oes My 


a e ee 


If M is the true mean score, we may write as in 16.01 


Sı ‘=(M;—M)_ (M,—M) 


— AAA — Ml : ; ‘ ; (v) 
o? i=1 0% > 


In this expression, we have 
Mi= > pi i : ; ‘ ‘ 5 Ae 


Thus M, is the mean of a c-fold sample of column mean scores. Hence the variance of the 
distribution of M, is given by 


eo 
S, EM, = MP M,— MY A 
i ee. lst. tel 


10 


c 10 


¿=1 x 
If the score distribution is normal, that of the mean is also normal. Hence we have now ex- 
pressed (iv) as the difference between a c-fold sum of square normal scores of unit variance and 
one square normal score of unit variance, i.e. as the difference between a Chi-Square variate of 
c degrees of freedom and a Chi-Square variate of 1 degree of freedom like the expression defined 
by (vi) of 16.01. In (vii) above M, is the mean of M, as M, is the mean of M; in (viii). ‘Thus 
the two expressions are in all respects comparable. It is therefore unnecessary to repeat with 
appropriate change of symbols the orthogonal transformation of 16.01 in order to show that 
(S: + a?) is a Chi-Square variate of c — 1 degrees of freedom. 

Mutatis mutandis the same argument applies to the statistic 


j=r 
=== (M; — M,) a ‘ s ‘ (viii) 
j=1 


In this case (S; — o?) is a Chi-Square variate of (r — 1) degrees of freedom. 
9* 
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The total sample variance yields an unbiased estimate of the same form as (v) and (vi) 
in 16.01 based on rc-scores. 


We may write it as 


i=cj="r 
(ro — Di Sy => > (xi; — Mo) (1x) 
| i=1j=1 

As in the derivation of (v) we obtain 

Sias _ LW (xi — MY? reM, — MY 
o i=1 2, o? e a? ; i ; Ta 
Since M, is the mean of an rc-fold sample, the variance of its distribution is given by 
Ps E 
O 
e qc Jus q Pcs 2 — ¡MVP 
E AAN 
d i=1 j=1 a Va 
It is thus evident that S;; in standard form is a Chi-Square variate of (rc — 1) d.f. 

That the numerator of the estimate s? of 13.03 and 13.04 is also in standard form a Chi- 
Square variate with (r — 1)(c — 1) degrees of freedom is less easy to see, and calls for closer 
examination. We define it by the relations 

PEE tE. Y, 


See 
MV,)+ MV.) —V,.=V,=V,—V(M,)—-Va 


We may write the preceding expression more fully in standard form as 
| ce F, .E Su ug a — M,)? ag . (M; — M,y 
i=1 j=1 


Eo (M; son M,)* 
j=1 $ 


2 . 

i=1 e 
We may transform this as above to 4 sets of square normal scores of unit variance, viz 
(e aes 


1) Vs, Sea tee Ra M) 
pr: a =2 > i 2 


M}  (M,— M} 
o” : i o + » 
¿=1j=1 x 
I& (M; — Mẹ © (M, — MY 
yee 
J= 


< {6i 

r i=1 07 ) 

It will be more easy to follow the appropriate orthogonal transformation, if we drop the 

subscript symbolism, and deal with the procedure. Accordingly, we assume the following 
schema of independent normal score values 


Scores 


Scores 
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For this assemblage we may write the first term of the expression on the right of (xii) in 
the form 


t=cj=r “et ae Pe s=12 e 2 s=12 
> > (Xis - e > E sd z= a . (xiii) 
i=1 j=1 G s=1 i s=1 
By definition 
s=12 s=12 
12M,= > x, and 12.M,—M)= > (%,— M), 
s=1 s=1 
a, HO eS 
6 = Cy 
O s=1 
Since o = V 120, 
Mee pan 


ee RAR 


We shall now assume that it is possible to transform the c-scores into u-scores in accordance 
with the orthogonal relation | 


LS : ¿o (y) 


- (EVI) 


E AAA AS Cs A 
Oy 12.6 


Hence we have the following expression for the first two terms on the right hand side 
of (xii): 


j=uri=e(x.,— MP M,— My s=12 - 
> ghe- M d E . (xvii) 
j=1 i=1 e a ge =e 


We shall now write in the term of (xii) involving the row means 


MM): : (Mg m : (M;.,—M) _ 
A en eS O 


O» o, 2 PR s . (xviii) 
Similarly, for the term involving the column means we shall put 
(M,.e—M)_ |, Me M) i a 
Oo To 
(Mo M) _ m: a a a a 


Oe Oc 
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Hence the last two terms in (xii) become 


© (M; — My 


Ai es PS Uy vi + u + Us . . . . . (xx) 
i=cC M; M i 
> onl Sta wi + we + we + wi . . . (xx1) 
¿=1 c 


The v- and w-scores so defined are linear functions of the c-scores. For instance, 


(x m. (1, — M) | (#;—M) , (%,—M) 
MM o S 
M,.,—M | | 
A ce 
Since V40, = c0 
Mı., — M T 
1 = 73 (01 + Ca + cs +e) =0 > i , . (xxii) 
Oy 


Similarly, we have 


= 6 + ce + c, +c) and vy = sala + Cro + Cr + G) 


+. ep Og = J > Cs, 


a 7 +v: +v) = a DoH . + : : a 
In the same way we derive 
w, = 73 (01 + Cs + 09); 20, == 6 + Ce + Go); 
Ds = rales Hetcu); w, = +3 +i . A . (xxiv) 
S 7% + Wa + Ww + Ww) = 4. : $ i ; . (xxv) 


Thus our v-scores and w-scores are each expressible as linear functions of c-scores, as are our 
u-scores by definition. It follows that the v-scores and w-scores are linear functions of the 
latter ; and we may assume the orthogonal relations 


9 
o? + o + of = a + ud H ah 


2 2 
wi + wy + 203 + w, =U] -+ Wy + Us + Ue, 


i=r(M.— 2° t=eM. — MP = 
E A o : E . (xxvi) 
j=1 O; ¿=1 To s=2 


By substitution from (xvii) and (xxvi) in (xii) we have 


(r — 1)(c — 1) 


g? 


2 2 2 
E T u? =P Us + Us + Uio + Uy, + Uiz 


SIGNIFICANCE TESTS FOR ANALYSIS OF VARIANCE 687 


This expression involves 6 = 2.3 = (r — 1)(c — 1) square u-scores, i.e. (r — 1)(c — 1) in- 
dependent Chi-Square variates of 1 d.t. The general pattern of the transformation is as follows : 
t= y pap 1 l 


ee- Loe A > aS u? + u — [2u + 2 u? + >i u? 


2 
O s=1 s=r+1 
s=rc s=r+c-—1 
bash A = 
Late Us Us 
s=2 s=2 
s=rc 
PE 2 
E EE 
Po 


The last expression contains rc — (r + c) + 1 = (r — 1)(c — 1) terms. 
At this point, the reader may reasonably want assurance that the simultaneous transformation 


of (r + c — 1) of the u-scores into v-scores and w-scores in the foregoing is consistent with the 
orthogonal relation between the entire rc-fold set-up of u-scores and c-scores. In accordance 
with results obtained in 16.02, the following arbitrary constants satisfy the v and w score trans- 


formations: 


T 1 
uy = V3 21 oa /3 02 ay 1/3 23 ; 
eee = 
Uy = eo = Jee? ; 
1 1 = 
lig = Fea rg C2 — os; 
* + * X * ok + * 
ne $ 1 1 1 A 
u= 301 + ZW +H ZW + 34; 
Ee = 1 . 
Uy == FE Fora Ti a. . 8 . >) 
ae a as ; 
Us, aia i e . . / 2 Wes eu, fas 9 
a 1 A. A 1 
Us = 30, + ZW — ZW; — 3%,4, 


It will suffice to examine the orthogonal property of the pair u, and u¿ so defined when we 
express them in terms of the c-scores. From (xxii) and (xxv), we have 


bls = “wae + c3 + C3 + Ca + Cs +g + Cy + Cg) — Fat + Cw ty + C19) ; 


Ug > TAG FaF cet Cs + ba) E tata ESA Og): 


Evidently ds sum of the squares of the constants in each row is unity and the sum of the cross 
products vanishes as we see if we set them out as below : 


Cy C2 C3 C4 C5 C6 C7 Cg Cg Cio Cy C12 
oo ae a ES 1 Se ae ee on ae et 
E SL VA VO Ven WES vil we vit ve ye ye vG 
A A A E o ee ee E ee E 
EA AS. MES Yi AAN Y y 12 vit vi? yis 


For this set-up we can thus define 6 u-scores as linear functions of the c-scores in conformity 
with the rules : (a) that the sum of the cross products vanishes for any pair of rows ; (b) that the 
sum of the squares of the constants in any one row is unity. It remains to define 6 u-scores 
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involving 12 x 6 = 72 linear constants of which the orthogonal relation for the square constants 
in columns and rows fix the 12 of the last row and the last member of the 5 remaining rows, i.e. 
17 in all. We can fix 55 remaining constants arbitrarily to satisfy the condition that the sum 
of the products of corresponding constants in any pair of columns vanishes.* 

In (vi) of 16.01, we have eliminated the negative term by putting 


Ure Big enc, oR 


This procedure depends on the fact that the statistics under consideration so far involve equal 
numbers of scores in each column or row. Ina set-up involving only one criterion of classifica- 
tion as in 13.07, we may have different numbers (7,) of items in different columns, and it is 
necessary to modify the foregoing procedure. Let us first consider the statistic defined by 
(viii) of 13.07, viz. : 

1 i=¢ 


$ = > r¿M, — My. 


: c— lS 


* In more general terms the schema of the double transformation is as follows : 


x — 
AA A © 
i i=l jel m=1 
i=c j=r 
M, — M 1 = 
ai AS Cj = uy, (11) 
Oz CE 
M; — M 1S 1 1 ee > 
= Y = 7 Ci and a oe 0; = ve Cz = Uy (111) 
re Ci= ‘j= 'C5=1 i=1 
j=r m=? 
ae 2 ; 
j — Un e (iv) 
j=1 m=1 
M,— M 1 1 ii wal 
= W; = We > Cij and Ea > w; = y Ci = Uy (v) 
i=Cc m=r+c-—1 
2 2 = ; 
> wi = uj + > Um * a . . ° . * (vi) 
¿=1 m=r+1 
i=ej= jr tae 
>. (u — MY 2 0-M JOTA 
2 2 2 2 
E oy oy Fe 
m=re m=? m=r+e-—1 
2 2 2 2 
= > bt- > -ui Um 
m=1 m= m=r+1 
m=rc m=r+c-—1 
An > O 2 
== Un u 
m=1 m=1 
m = 70 


= > uw, with (Ge —r —ce + 1) = (r — 1)(c — 1) terms. 


m=r+ec 
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When there are two columns (c — 1) = 1; and we shall now see that the ratio of sẹ to the true 
variance of the score distribution is a Chi-Square variate for 1 d.f. If we put 7, = a, 7, = b 
and (a + b) = n, we may then write 


ee eee EE b(M, — M,) 


a = ze . (xxvii) 
In this expression the sample mean is given by 
M,= E . Se ; ‘ . (xxviii) 


n 
Whence we have 


a(M, — M,)? + b(M, — M.) = a . M2 + b. MẸ + Mz —2M(a. M. + b. Mi) 
— a . M2 + b. MÈ + nM? — 2nM, 
~- aM, — M,) + b(M, — Ma) = a . MẸ + b. MẸ — n. Mg - ; ; . (xxix) 
If the unknown true mean score is M, we may put 
a(M, — M} + b(M, — M} = a. MẸ + b. MẸ — 2M(a . M, + b. Mi) + nM? 
= a, M + b. Mi 2nM.M,+ nM” 


— (a . M2 + b . MÈ — nM?) + (nM; — 2nM . M, + 1M”). 
Whence from (xxix) 


a(M, — MY + b(M, — M} = a(M, — M,)? + b(M, — Ma) + n(Ms— MF, 
E a(M, — My + b(M, Faz M} = aM, — M)? + b(M, Fg My — n(M, — M)?, 
2 aM,—M)? bM,—M? n(M,— MF 


¡AGAR MES PA AA SE AQ IA E ATI AO A 

. q? = . . - (xxx) 
If we denote the variances of the distributions of the column means by oj and oj and that of the 
grand mean by 0%, we have 


ad" ba = te, j : ; ; . (xxxi) 


Each of the three terms on the right of (xxx) is thus a square score of unit variance, vtz. : 


2 E, EE 2 — MV E 2 
go a M) (M, ) aa (M, . M) (xxxii) 
Oy Car 


We shall now assume that the score distribution is normal whence those of the column sample 
means and that of the grand mean are normal, so we may write in the usual way as square standard 
normal scores 

(i, My 


2 
oe 3 
Op 


se (Ma pet My 
= ——_ _—— 


Oa 


ma = (xxxiii) 
In the third term of (xxxii), we note that 
n(M, —M)=a.M,+6.M, — nM = a(M, — M) + b(M, — M), 
_(Me=M) _ MM), WM, — M) 


Ox NG » NO y 
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From (xxxi) we have 
ato, = no, = bo,, 


. 0430. (90a Pat 0 


Ca 


a\* AY 
= (2) Ca + (*) po i ; (xxxiv) 
n n 
a\? AY 
n ae 8 Ca + (7) Co; 
n n 
Up = | -| ca —(-] co. 
n n 


The variance of the v-scores so defined is unity, since (a + b) = n and they constitute a pair of 
normal standard scores. Also it is evident that 


o peada: 
(M, — MY 


2 
Oz 


Ox 


We may now put 


See 
y 


Whence from (xxxii) and (xxxiii) 


5 
2 =0 +00 =0 . ; $ i . (xxxv) 


Q 


Thus the statistic on the right is a square normal standard score, i.e. a Chi-Square variate of 
tdi. 


Let us now consider the residual statistic s2 of (viii) in 13.07. For the two-column set-up 


we defined it as 


PES M}, 


o 
—? > ae MF T 
j=1 


A oe 


g? g? 


j=1 


We may assume that the column samples are independent, and the expression on the right is 
therefore made up of two independent Chi-Square variates of (a — 1) and (b — 1) degrees of 
freedom respectively. Hence the distribution of their sum is that of a Chi-Square variate of 
(a — 1) + (b — 1) = (n — 2) degrees of freedom. 

For a table of more than two columns with an equal number (r) of score values in each, we 
may define as a Chi-Square variate of c(r — 1) degrees of freedom 


g? 


a E =P pe = My : : ; i (xxxvii) 


2 


In terms of the parameters of the score-grid 
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When our concern is with only one criterion of classification, the numbers of items (7;) in each 
column need not be the same, and if n is the total number of cell entries, 1.e. n = rc in the above, 


hee i=c j=", 
: - e = MV.) and Botes > > aa T .  (Xxxviii) 
1=1j=1 


1605 THE PAIRED DIFFERENCE TEST FOR SMALL SAMPLES 


In Chapter 7 of Vol. I we have distinguished between two ways of investigating a real 
difference in the domain of representative scoring : 


(a) comparison of the mean scores of groups subjected to different treatments as in 16.06 
below ; 


(b) comparison of response of one and the same individual before and after treatment or 
of the effect of different treatments on pairs of individuals the two members of which 
share a common peculiarity. 


The second procedure involves the null hypothesis that the true mean difference between 
paired scores is zero ; and we can test its validity if entitled to assume that the d-score (1.e. paired 
score difference) distribution is normal, as it will be if we regard each pair of observations as a 
2-fold sample from a normal universe. On that assumption, we can safely apply the c-test for 
a normally distributed score, if the sample is large. If Mj is the mean difference and sẹ is an 
unbiased estimate of the true variance g£ of its distribution, the appropriate ratio for a p-fold 
sample of paired scores is as given by (xix) in 14.08: 


esl ee 


In view of what follows, it is important to stress that sf, is an unbiased estimate of op as 
defined by the relations implicit in the above, viz. : 


17 
EL- m AMA 2 d? i i ; 4p 


If M is the true mean and oå is the unknown true variance of the d-score distribution, we may 
write in the symbolism of 13.02: 
T, Elg — MP? =0%= E,. E,(d?) — MA. 
In this case, the null hypothesis implies that M = 0, 
. E,. E,(d;) = 04 


2 
e lay 62 
p 
In the preceding expression 
peat 
Whence as in (11) : : 
E Sin) e Om 
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The exact c-ratio (Ma ~ Om) corresponding to (i) is a normal variate of unit variance. Unless 
the null hypothesis—as in Mendel’s laws—prescribes a priori values both of the true mean (in 
this case, zero) and of the true variance (o;,) of its distribution (here, unknown), the c-test for 
paired differences is inexact in the sense that an unbiased estimate (s%) of the universe parameter 
om replaces the latter as in the denominator of (i) above. The empirical critical ratio so defined is 
therefore subject to sampling error ; and its distribution is indeed determinable when the parent 
universe is itself homogeneous and normal. Vis-d-vis the significance of paired differences, this 
limits its usefulness to situations in which we can legitimately postulate a homogeneous dis- 
tribution of d-scores. As indicated in 14.08 a d-score distribution is homogeneous, if and only 
if, we can regard each pair as a 2-fold sample from a sub-universe with the same sampling 
variance as any other such sub-universe. 

In what follows our immediate concern is with the distribution of an empirical ratio for the 
mean score of samples from a homogeneous normal universe ; and its relevance to the problem 
of paired differences (d-scores) is one which we shall examine more fully in 16.07 below. We 
shall therefore denote by M, the mean of an n-fold sample of unit scores (x,), and make no 
assumption concerning the numerical value of the true mean (M) of the parent universe of 
x-scores. Accordingly, we define our unbiased estimate of the variance of the n-fold sample 
distribution in the usual way as 

r=n PE 2 
Bee ond 3 e Steer ols 


y=1 


a (111) 
Our unbiased estimate of the variance of the distribution of the sample mean (M,) will 
therefore be 

aoe r (x, ps MY 
lan 2 n(n — D ` 


r=1 


We may thus define an empirical ratio (£) by the relation 


A 2 
1? = Si bs . = a 4 Š é (iv) 
P- 7 way MP A OS 
cor 2 . . . . (v) 
ge Oa) (x, — M,)? 
g Sn ; 2 o? ; 


In the foregoing expression the numerator is a true square standard score, i.e. a Chi-Square 
variate of 1 d.f. The denominator is a Chi-Square variate of (n — 1) = f degrees of freedom, 
since we can write it as 


pa es My pe nM. re My a n Es My ES (M CEE My 


2 2 2 2 
r=1 o 0 =1 ¡es Om 


This has the same form as (vi) in 16.01, and if we apply the appropriate orthogonal transformation, 
we have 
P us 


ete... we 
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In the above our u-scores are by definition independent, and we have eliminated uj from the 
denominator. Hence the numerator and denominator are independent statistics, i.e. (t£? + f) 
is the ratio of a Chi-Square variate of 1 d.f. to an independent Chi-Square variate of f degrees of 
freedom. 

Now we have seen in 15.04 that if x is the ratio of a Chi-Square variate of 1 d.f. to an in- 
dependent Chi-Square variate of f degrees of freedom, its p.d. is 


l E 
BL UPS 
In this case x = (t? + f) and the simple scalar substitution of Case I in 15.03 yields 


I(x) = 


p=1 


1 
E) = c. > IFA : : i PSs 
(£) B(4, 4f) Vf (1 ‘se (vi) 


The derivation of the p.d. of ¢ itself in accordance with Case IV of 15.03 presupposes an 
ulterior reason for believing that f(t) is a symmetrical function of t. The latter is the ratio of the 
deviation of the sample mean from the true mean to the square root of the sample estimate of its 
variance; and the distribution of this ratio is necessarily symmetrical if the distribution of 
score deviations in the parent universe is itself symmetrical. This is easy to see of the discrete 
universe and hence of a hypothetical continuous universe in the limit. For a given numerical 
positive value of the score-sum or mean (M) we may pair off every uniquely constituted 
combination of unit scores with an otherwise identical set of reverse sign, their mean (— M) 
being numerically equivalent to M but negative. 

One illustration which the reader may explore more fully should suffice to make this clear, 
viz. extraction of 3-fold samples from a 7-class rectangular universe of score deviations — 3, 
—2 —1,0, +1, +2, +3. We need only consider score-sums of + 3 with mean value 
+ 1. All corresponding combinations of unit scores consistent with these values are then 


=D SO der A RD 
A A AS y 
E a ‘o a 
ee e e 1. 


Thus corresponding sample values of + M and — M will have equal frequency if corresponding 
negative and positive values of the unit scores (x,) have equal frequency; and sample 
variances corresponding to each pair of numerically identical scores of reverse sign will be 
necessarily identical, and will not affect the ratio of (M,— M) to Sm. The distribution of 
the ratio therefore depends only on the distribution of M, ; and this is necessarily symmetrical, 
if the distribution of x-scores in the parent universe is symmetrical, as is true of the normal 
universe we postulate in this context. 
As in Example 5 of 15.03, we may thus write 


(t) = E 


Bi, ge Cae 


The table of the +t-integral in Kendall’s treatise gives the probability (P,) of a value being 
numerically as great as or greater than + £, 1.e. 


t 
1— P= | g(i).dt 
—t 


(vii) 
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In this expression g(t) is as defined in (vii). For an n-fold sample of t-scores, f =  — 1, and 
as in (v) 
— (M, — M)Vn(n — 1) 


MY 


r=1 


(viii) 


For f = (n — 1) in (vii) the table uses the symbol v. Kendall’s table cites t-ratios for 1 to 20 d.f., 
i.e. samples of 2 to 21 paired d-scores in this context. 

The f-distribution so defined would be of no particular use if our null hypothesis did not 
in fact postulate the numerical value of the true mean, i.e. M = 0 for paired differences, when it 
is legitimate to use the t-test to assess their significance. Thus (viii) then becomes 


== Mnn) | 1X 
(= Fa- my] : : ; : E sh) 


The orthogonal transformation employed in deriving the distribution of the Gosset ratio 
defined by (viii) presupposes that each d-score comes from a sub-universe having the same score 
distribution variance as each other sub-universe. The only sense in which the sub-universe 
from which we extract a particular pair of scores can then be different from a sub-universe from 
which we extract another is that the mean value of the unit sample from one is different from the 
mean value of the unit sample from the other. Thus the t-test commonly prescribed for paired 
differences is of far more limited application than the appropriate c-test, though more precise, 
if applicable. The distribution of ¢ has, however, a special interest inasmuch as it discloses 
how rapidly the distribution of an empirical ratio tends to normality as the size of the sample 
increases. 

In 16.04 we have found that the t-distribution approaches the normal very closely for samples 
larger than 50, but for smaller samples the discrepancy is large, as shown by the following figures 
for the probability that ¢ in (ix) does not lie within the range + 2 or + 3: 


J= pm + 2 + 3 
t-table c-table t-table c-table 
5 0-102 0-046 0-0300 0-0027 
10 0:073 0:046 0:0134 0-0027 
20 0:059 0-046 0-0071 0-0027 
00 0-046 0:046 0:0027 0:0027 


Evidently, it will be grossly inaccurate to evaluate P, for the ratio defined by (ix) by recourse to 
the c-table (probability integral) when the size of the sample is small ; but the error involved in 
using (i) to define the appropriate c-ratio is much less. To get the relation between ¢ of (ix) 
and c of (i) into focus, we may write for brevity 

> @=S, and > (d,— My? =S, —n. Mi. 


fp 1 m= 
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We then have 


> DM, es n M3 
AR ee eS 
eS ee 
“£2 n—1 SA T f 
f+ 1)22 
A ‘ ; ; i ; e 


This equation defines the empirical value of the c-ratio in (i) corresponding to that of £ in (ix) 
for a specified value of f. We may denote by P, the value assigned by the normal integral to the 
probability that (i) will not lie inside the range + c, using P, as before in the ensuing table. 


From the above we see that the use of (i) as a normal variate underestimates the odds 
in favour of significance as assigned by the exact distribution of the corresponding ¢ variate of 
(ix), but the discrepancy is not very gross at the 5 per cent. level. 

Numerical Example. In 7.07 of Vol. I we have used (i) above to test the effect of constriction 
of the vessels of a finger on the haemoglobin content of the blood drawn therefrom. ‘The number 
of paired observations, each member of a pair on the same individual, was 39, so that f = 38. 
From the figures cited on page 316 of Vol. I we get 


ae == 200; M, = 1-33 : Žao; 
> (d — M,)? = Y d? — n . Mi = 296 — 69-3 = 226:7, 
88-39. (LISP 26334 


2 


226-7 = 67 
+ toe. 
By recourse to (1) we obtain 
(241. 
x x x x x * 


Confidence Limits of an Estimated Mean. ‘There is, however, another and important use for 
the ¢-distribution interpreted as that of the deviation of the mean (M,) of the n-fold sample 
from the true mean (M) of a homogeneous normal parent universe. It may happen that we 
want to estimate M in which case M, is an unbiased statistic ; but it is possible to take a step 
further, i.e. to assign confidence limits between which M lies. 


696 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


We shall see how to do this more readily, if we first assume that we know the value of M 
and o”, the variance of the u.s.d., whence also 0% = (0? — n). We may denote the deviation of 
the sample mean from the true mean by h = (M, — M). By hypothesis, therefore (h + om) = c 
is a normal score of unit variance. At the 2o level h = 20, and c=2. The probability 
that h will be numerically equal to or greater than 2c,, is then given by the table of the normal 
integral as 


9 (2 
P, = El e” tde = 0-945. 
T Jo 

Let us now suppose that we know the value of o? and hence of o, but that we do not know 
the value of M. If (M, — M) lies within the limits + h = + 2o,, it follows that M lies within 
the limits (M, F h) =(M, F 20m). Either statement is true of 95 per cent. of all samples we 
meet, and we shall therefore err in only 5 per cent. of our samples, if we consistently 
set our estimate of M in the range from M, — 20m to M, + 20m. For instance, we may suppose 
the sample mean is 11-5, and that the true value of om is 0-75, so that 20m = 1:5. A deviation 
of 1:5 either way signifies in this case that + (11-5 — M) = + 1-5, whence that M lies within 
the range from 10 to 13 inclusive. Such then are the limits of the range of admissible values of 
M at the 95 per cent. confidence level. 

The foregoing argument presupposes that we know the value of om. This will rarely if 
ever be true in laboratory or field work, though it is easy to construct a model set-up in which 
it would be so. A more usual type of situation is that of the investigator who wishes to assign 
a value to the length of a piece of wire on the basis of successive observations (x) subject to an 
approximately normal distribution of instrumental error. He then has two sample parameters 
on which to base his judgment, viz. the sample mean M, and an estimate of the variance (om) 
of the mean, viz. : 

E nl Fo rene M,) 
*m — 2. n(n — 1) ` 


reel 
The t-distribution then defines that of the ratio (WM, — M) =— Sm. For a particular value of 
f= (n — 1) the table of the t-integral cites how large £ must be if P œ 0-05 is the probability 
that the value of ¢ lies outside a prescribed numerical value. _ Let us suppose that the table cites 
t = +a as the prescribed value, so that 


M, — M = + asm and M = Mp + asm. 


We can then say that M lies within the limits M, + 4 . Sm at the 95 per cent. confidence level, 
if we assign to a the tabular value prescribed by P œ 0-05. 

For simplicity, we may take the foregoing figure for the observed mean M, = 11-5 and 
assume that Sm = 0-75 for a sample of 10, so that f = 9. For this value of f the table of the 
t-integral gives t = 2-26 at the level P = 0-05, i.e. odds of about 20: 1 against getting a value of 
t numerically equal to or greater than 2:26. This specifies a deviation + (2-26)(0-75) = + 1-695. 
At the 95 per cent. confidence level we shall thus say that the true mean will lie in the range 
11:5 + 1-695-or from 9°805 to 13-195. For f > 50 the normal integral will give a figure which 
does not appreciably differ from the result of proceeding as in this example. ‘The reader should 
be able to interpret the appropriate procedure for any other confidence level (e.g. 99 per cent.). 


16.06 THE Group MEAN DIFFERENCE TEST 


In contradistinction to the approximate c-test of 7.06 in Vol. I we have examined in 13.07 
an alternative approach to the recognition of a difference between the mean score of two 
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independent samples. If on esample consists of a and the other of b = (n — a) items, we may 
define two statistics as in 16.04 by the relations 


=== + b(M, — 2. ee ee ee 


== 2 (Xa; — May? E ati LŽ ny = MF =+ .. i (ii) 


In the last expression 


ee jand 
2 == . — 2 2 == .- — 2 
Sa i a 32 (Xaj M,) and Sp n 22, (Xp M,) 


As we have seen in 13.07 the ratio of the two is then equivalent to 
T ue, My. aM, — M,) a 
a A : : 
A Eres ete 


The statistics denoted by sí and sí are each estimates of the true variance (0?) of the score dis- 
tribution of the putative common universe of the null hypothesis, i.e. that the column samples 
do in fact come from one and the same universe. ‘Their consistency is therefore a criterion of 
the absence of a difference between the column means other than such as might arise by random 
sampling. We may express the ratio defined by (iii) as 


se | 
=; = (n — 2)R?, 
Sq 


u 218 


s2 
=— Se 1 
. Ria 2 oe eer oa 


Co Oo 


From (xxxii) and (xxxvi) of 16.04 we can see that R? expressed in this form is the ratio of a Chi- 
Square variate of 1 d.f. to a Chi-Square variate of f = (n — 2) degrees of freedom. ‘The problem 
of the distribution of R is therefore soluble, if we can show that these are statistically independent. 
We first recall a result obtained in 16.05, where we have seen that it is ets to ec thee the 
numerator of (iv) in terms of two independent normal scores of unit variance, viz. 


2 $ b\t 2 
z = vj + 03 — (5) vı + (7) o, | ; ; i ; 5 AND 


In (v) the meaning of v, and v, is 
M,—M M,—M 
E AE 
Oa Op 
_ We may transform the denominator in (iv) as follows : 


(n —2)s% Elea Me le), M a(M,— My b(M, — MY 
ee ee a 


"=? (stm —M)? (M. — MF : (M, — M} 


ae 
ore SA ear eet ek a 2 
o 0; 


= > g- . ; ; ; ; = (vi) 


Oo oO 


698 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


If we put wigtw... 00 =(G+AG+ CO... cz) in the usual way we may choose 
the linear constants connecting any v-score with the c-scores so that the v-scores are themselves 
independent normal scores of unit variance, and (vi) becomes 


n — 2)s 
-Di ot ob tek... oof of 


00. = so) ; . (vii) 


Since v; and v, are independent of vs, v4, etc. (v) therefore defines a statistic which is independent 
of (vii). 

Thus the ratio R? in (iv) is a Type VI variate of the same form as (vi) in 16.05, and its square 
root (R) is a Type VII variate. If we write (n — 2) =f. 


1 


CO BEd RA 


We may express the square root of the ratio defined by (iii) in the form 


ee ee oe. (iti) 
(= +5) (2 + só) 
a b a b 
oy anal 
E 
1 


mike F(t) == A ee 
BG, af) VF (1+ *) 


Thus the ratio ¢ so defined is a f-variate of f = (n — 2) degrees of freedom. 


Numerical Example. The following data refer to the bispinous (sacral) width of boys and 
girls aged 9-94 years. 


No. o No. o 
cm: a oe Fotai 
6-5 0 2 2 
7:0 4 8 12 
75 6 4 10 
8-0 8 9 17 
8:5 6 1 7 
9-0 0 1 1 
9:5 i 0 1 


Total 25 25 50 
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From the above we have a = 25 = band n — 2 = 48. We denote the boy’s score by x, and the 
girl’s by x,: 


ax, = 19750; Mo = POD - aM? == 156025 ; 
Sx = 106925: 5G, — MF = >) — aM? = aV, = 9-00 50) 
>i 188-50 ; M, = 7-54. bM? = 1421-29 ; 
> x2 = 1430-75; > (x, — MY? = >? — 0M} = bV, = 9:46 ga 
Bi eh = ME nM2, = 2979-92 ; 
S(x — Ma) = Do? + > — Mi = 20:08. y i s ep 
(M, — M,)? = 0:1296 . : ; (ue) 


From (ix) and (x) we have 


_ 2 a — Ma)? a MY? 9 


==> == “pe Ke 
re TNT 0:1875; 
sok a 
§ = >= 7 = 1971. 


Whence from (111) : 
ab(M, — M) _ 625 (0:1296) 
n(s?+s?)  50(0:3846) 
+ t= 2:05. 


t2 — — 4-212, 


The corresponding ratio for the approximate c-test of 7.06 in Vol. I is given by 


ab(n — 1)(M, — M,)? _ 625(49)(0-1296) 
n> (x — Ma)? 50 (20-08) 
. c = 1-99. 


c? = 


— 3-95 


16.07 TESTING THE VARIANCE RATIO 


From quite different approaches, we have arrived at the definition of two statistics respec- 
tively for testing the reality of a difference between paired sets of observations and between 
group means, each statistic being expressible as a ratio of independent sample variances and 
hence as the ratio of two independent Chi-Square variates. This ratio (t?) is in fact a Type VI 
variate, but its square root is a Type VII variate, because the numerator happens to be a Chi- 
Square variate of 1 d.f. Except when we have to deal with a set-up having only 2 columns 
and/or two rows, the variance ratios defined in 13.03-13.06 as criteria of homogeneity in connexion 
with the procedure known as Analysis of Variance involve in both dimensions Chi-Square 
variates with more than one degree of freedom. Usually therefore the table of the t-integral is of 
no assistance in assessing their significance. 

On the assumption that the numerator is statistically independent of the denominator of 
such a ratio, the general procedure (F-test) follows from what we have learned in 15.04 and 16.04. 
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We will suppose that sê is an unbiased estimate of the true variance (0?) with a degrees of freedom 
and sz another estimate with b degrees of freedom. To say this is to say that | 


2 
Es 
o? 


is a Chi-Square variate of a degrees of freedom ; 


ise. & 
is a Chi-Square variate of b degrees of freedom. 
g: 7 


As a criterion of the consistency of the two estimates we define the variance ratio 


2 
O a 
Sp 


Let us now write 


= -F ; s ; PE 
If s? and sí are independent statistics we have seen that x is a Type VII variate, being the ratio 
of a (ha) to an independent T(4b) variate, i.e. 
pomo Gii 
fx) = Bia, il pier ia, GOL pwr 5) 111) 
Thus we have 
ga—-2 pat) Fa- 2) 
IO) = Baath BED Sp aPrers 
(Fa, 36) b (b + aF) 
qut@—2) . Bb+2) pF:a-2) (i ) 
= ZA ù- ; ; ; ; Mee E 
Bla, 200 + aFyOT® d 


To obtain the distribution of F, we make the usual scalar change, viz. : 


SF) = fla) = fla) 


Whence from (iv) above 
a peo Pre 


O eet ee o 


The expression defined by (v) is equivalent to the most general form of Type VI given by 
(xxi) of 15.04. We may write it alternatively as 


b 
(5) pu 2) 
a 
b la +b) 
Bide, $9). (2 + F) 


In the above (a~1b) replaces k in (xxi) of 15.04 in which m = La and n = 4b. By (xxiii) of 15.04 
the mean of the distribution is 


AF) = (vi) 


a o a a 


‘Thus the expected value of F tends, as one might expect, to unity when b is large, since F is the 
ratio of two independent estimates of the same parameter ; but it will be appreciably greater 
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than unity when b is small. If the denominator of the F-ratio is s? as defined in 13.03, it has the 
degrees of freedom corresponding to b, i.e. (r — 1)(c — 1). Fora3 x 4 table (r — 1)(c — 1)= 6 
and the mean value of F will be 1-5. 

The probability (P,,) that the ratio F will have a value as great as or greater than u, which we 
here assume to be greater than its expected value py, is given by 


P, =|" $(F)dF and | sua = fuer: es hae 


Of course, F may also be less than its expected value, and the probability (P,) that it will be as 
small as or less than v is given by 


P, = |" syd ee or eae eget 


To say that 5 per cent. of the area bounded by the curve lies in the range from u to ©, i.e. 
P, = 0-05 means that the odds are about 20 : 1 against getting a value of F as great as or greater 
than u. Likewise P, = 0-05 signifies that the area from 0 to v is also 5 per cent. of the area 
bounded by the curve, and the odds are about 20 : 1 against getting a value of F no greater than v. 
However we need not concern ourselves with the improbability of getting a value of F less than 
the mean, if we take advantage of the reciprocal property of the Type VI variate (Ex. 3, 15.04). 
If Z is the reciprocal of F, i.e. Z = F”, | 
Gy 7-2) 
b 


pS Ha +b) 
Ba, (F + Z) 


This is a Type VI variate of the same form as (vi) with interchange of constants, i.e. degrees of 
freedom. When F = u, Z = wand when F = œ, Z = 0. The change of sign in the trans- 
formation from (vi) to (x) as we have noted in 15.02 means that 


|. KPE = — | £2 a, 


{(Z) = (x) 


a | 4(F)dF = x AZ) aZ. 


If v = wu, the probability that F will have a value as great as or greater than u is therefore 
the same as the probability that Z will have a value as small as or less than v. Now our 
concern is merely with the consistency of the estimates sí and sj. It is therefore immaterial 
whether we chose s2 as the numerator and sí as the denominator of F or vice versa, so long 
as we use the appropriate Type VI variate, vig. (vi) or (x), as the case may be. For economy 
of tabulation we may define s? as the greater of the two estimates, so that F itself is always 
greater than unity. 

Evidently complete tables of F so defined for a wide range of corresponding values of a 
and b would fill a bulky volume. Snedecor’s condensed table gives two entries in each cell for 
corresponding values of a (a,, as, etc. below) and b (b,, ba, etc. below). The first (Fo.0s) is the 
numerical value of the variance ratio bounding the 5 per cent. tail of the distribution, and the 
second (Fo.o1) is that which bounds the 1 per cent. tail, i.e. the odds are about 20: 1 against 
getting a value of F as great as or greater than Fy. and the odds are about 100: 1 against getting 
a value of F as great as or greater than Fo.. The lay-out is then as follows : 
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Ç 
A a ay Aa As 
eee E ee A E 
by 
Foos 
ba Foo 
bs 


If a = 4, and b = 8, the two entries are 3-84 and 7:01. This means that for the ratio F = s? — s? 
of two estimates sí with 4 and s? with 8 degrees of freedom, the odds are less than 20 : 1 against 
getting a value F > 3:84 and less than 100: 1 against getting a value F > 7:01. In other words, 
F = 3-5 is below the 5 per cent. significance level, F = 5 is above the 5 per cent. but below the 
l per cent. significance level, F = 10 is above the 1 per cent. significance level, i.e. there are 
adverse odds of more than 100 : 1 against getting a value so large. Any such value of F of course 
implies that s? > sj. The reciprocal of the value 7:01 is approximately 0-14. If our tables 
recorded the value of F for sẹ > sá we should therefore find the entry P4.p, = 0-14 for a = 8 and 
b= 4. 

En passant, it is worth whileto recall the distinction we have drawn between vector and modular 
probability in Chapter 5 of Vol. I. When we speak of a 5 per cent. significance level in connexion 
with the normal distribution, that of the t-variate or of other symmetrical function with mean 
as origin, we commonly specify a range of numerically equivalent score values of opposite sign 
with a total expectation of 0-05, i.e. an expectation of 0-95 that the score value will neither be 
as great as a given score value nor as little as the same score value with reverse sign. This is 
a significance level referable to modular probability. The significance level we specify above 
in connexion with the F ratio is referable to vector probability as defined elsewhere, since we 
are concerned only with the improbability of getting a value as great as or greater than the 
observed one. 

To justify the use of the F-test it is, of course, necessary to establish the statistical indepen- 
dence of the statistics sí and sê in the numerator and denominator. In 13.03 we have seen that 
homogeneity w.r.t. 2 criteria of classification implies the consistency of the estimate s? with 
Sands Le 
(c — 1)s? (r — 1)s? 


Fine pe A 


For the 3 x 4 table we obtained in 16.04 above : 


t— De De 
A a a TEs 


F,= 


Oo 

By definition 
eshe > (M; — oF = 3" -~ sea (M, — My 
t= } AA — 


2 
oO j=1 Or 
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or the same set-up, therefore, 


(r — 1)s? 


2 
— [= (å + 43 + 43) — 4 = 43 + u3. 


O 
More generally, 
onc ae 2 s=re A 1 2 s=r 
(r x E $ 3 u? and (7 7 5 5% > Ww, 
s=r+c s=2 


Thus the denominator of F, so expressed contains no one of the independent u-scores present in 
the numerator. Similar remarks apply mutatis mutandis to the ratio F,. 
For a rapid computation we employ the symbolism of 11.05 (p. 448-458) and 13.04 
(p. 552), viz. : 
j=ri=c í=0 j=r 
== ae AS, = Y Y MES a ae ae H 
j=1 i=1 i=l j=1 
In this notation 


(c— 1)? = S. — S; (r — 1)? = S, — S; (r — Dc — Ds = S + S—S,—S,. 
Whence for testing column and row effects against residual variance, 


A e SS) 
“2 ame e a = ™ 


The case which arises when there are only 2 columns, so that (c — 1) = 1 and hence 
there are r pairs of scores, is of special interest. In this case we may label the two column 
means as M, and M, respectively, distinguishing corresponding raw-scores as Xa; and %,;, so that 
M; = (Xa; + Xp5) and 


bel g 
Sq = >, (a + xf) ; S = 2r . M? = Z (M, + Mo); 
j=1 


Jar j=r 
S, = HM FM) S,=c > M?=4> (as + ao) 
pei pel 


It is possible to express these quantities in terms of the paired score differences (Xa; — %5;) = d; 
and their mean Ma = (M, — M,), in virtue of the identities 

Xaj + Xoj = 2% 5 — d; and M, + M, = 2M, — Mz. 
We then have 


j=r | 
Sa = > (2x3 — 2x,;.d; + d?); S=2r.M?—2r.M,. Ma + 4r. Mi; 


jæi 
o A ET E A? 
jai 
S.—S=Mj and S+S- S 5,42 di dr Mi 
Whence by substitution in (xi) we obtain 
o A A ; . (xi) 


d?—r.M2 > (d;—M,) 
==] : 


j=1 


J 


704 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


Numerical Example. The following simple set of figures will serve to illustrate both com- 
putations involved in the use of the F-test to evaluate the nullity of the column criterion and 
its identity with the t-test for paired differences when there are only 2 columns and 2 criteria of 
classification. The set-up is for 3 rows (individuals) and 2 columns (successive score values) 
on the same individual : 


ee | gu 2 
j=l a 4 
7=3 4 8 


Whence we have 
Column Means: M,.=3; Mx = 5. 
Row Means: M =3; M..=3; M.s = 0. 
Grand Mean: M y 
Total Sum of Squares: S, = 118. 


pelea 4=2 
S=re.M?=96; S,=r Y M? = 3(3? + 52) = 102. 
i=1 


j=3 
S,=c > Mp = A3* + 3? + 67) = 108; 
j=1 
2 3 Yee 2(102 — 96) 3 
c-15.+S5—S,—S, 18+96 II 


The differences are 


dj=—2; d,=0;d,= —4; Mi 


j= 8 
S (d; —-M,?=0+4+4+4=8; 


qe 


gle DM; 3-2 EN 


j= 3 
2, (4; — Ma) 
j=1 
K + * * * * x 


Evaluation of t-Test for Paired Differences. From one viewpoint the identity exhibited in 
(xii), namely, that F, is equivalent to the so-called Student statistic £? for r paired differences when 
c = 2 is not surprising, inasmuch as we have seen that Type VII is the distribution of the square 
root of a Type VI variate when the numerator of the F-ratio is a Chi-Square variate of 1 d.f. 
Indeed, we obtained (vii) in 16.05 from the same equation as (iii) above. If f stands for the d.f. 
of the t-variate, the relevant substitutions made in deriving (vii) of 16.05 and (vi) imply the 
formal identity established above, vtz. : 


xt and + when c=2, f = (r— 1). 


a 
= 7 


A 
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In short, the paired difference test based on the t-distribution of 16.05 is exactly the same 
as a homogeneity test when the end in view is to decide whether a difference between column 
means is wholly attributable to residual sources of variation after eliminating variation associated 
with the row (pairing) effect. Now this identity brings sharply into focus a latent and not 
commonly recognized assumption in the prescription of the t-test for paired differences. Im- 
plicitly, we postulate that any row effect is strictly additive, in accordance with the following 
schema of score components : 


Column A Column B Difference 
Xa = Ca + Fi Xp = € + Fi dy = ea — ĉr 
Xaz = laz + Fo Xoz = er + Fo dy = eaz — €b2 
Xag = ag + F3 Xos = Cr + Fs ds = Cas — pg 
Xar = Car + F, Xor = bor + F, d, = lar — or 


In this assumed schema, the dispersion of our e-components accounts for all residual variation, 
if the null hypothesis under consideration is correct. Hence we may regard each pair as a 
sample from a sub-universe which differs from any other such sub-universe only in virtue of a 
factor F, determining the origin of the score distribution. In other words, our latent assumption is 
that we take our samples from strata of a score distribution different inter se in one respect alone, 
viz., that the mean score values of different strata are different, the variances of the sub-dis- 
tributions being identical. 

That we do, in fact, assume equality of variance in the derivation of the t-test prescribed for 
paired differences will be evident, if we retrace our steps to the beginning of 16.05. To express 
the square of the mean d-score in standardised form as an eliminable component of the denomi- 
nator and hence to establish that the denominator and numerator are independent variates, we 
have to assume that we draw each d-score from a normal sub-universe with the same definitive 
parameter o. Otherwise, the appropriate orthogonal transformation is unrealisable. 

This raises an issue of practical importance : in what circumstances can we invoke the t-test 
as an appropriate procedure, if we take advantage of the possibility of pairing observations in the 
design of an enquiry? ‘The applicability of a t-test to evaluate the odds against a mean paired 
difference score exceeding its expected value of zero by such and such in fact demands answers 
to two questions : (a) whether we are entitled to regard members of one pair as different from 
another only in the sense that the mean of an indefinitely large number of observations on 
members of one pair may differ from the mean of an indefinitely large number of observations 
on another ; (b) whether we are entitled to regard successive observations on members of the 
same pair as referable to a normal universe. 

To the first question we can give a positive answer in very restricted circumstances, e.g. 
(a) if the members of a pair constitute measurements respectively made on one and the same 
individual before and after some treatment procedure ; (b) if also the interval is sufficiently 


_ short to justify the assumption of no relevant change on the part of the individual and hence no 


source of variation other than error of measurement. The null hypothesis is then that paired 
differer.ces arise in virtue of errors of measurement alone. If the technique of estimation is the 
same w.r.t. all such pairs, this implies that the variance of score values for successive individual 
measurements on one pair is the same as for ell others. ‘The example given in 7.07 (p. 315) 
conforms to this requirement. Each pair of observations involves local measurement of haemo- 
globin of the finger of one individual : (a) before ligation, (b) after ligation of the same finger of 
the same individual, the intervening period being short. In this set-up, the null hypothesis that 
ligation has no effect implies that the only source of variation is error of haemoglobin estimation. 
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By pairing our observations we make it possible to eliminate the variation arising from the 
fact that mean haemoglobin observations made on different individuals are likely to be different ; 
but the mere fact that measurements of a pair refer to the same individual is not always a sufficient 
guarantee that our sample of d-scores is referable to sub-universes of equal variance. If the 
interval is long we may introduce diurnal or other sources of variation which we cannot interpret 
with confidence. Such is indeed a permissible criticism of the choice of a widely quoted 
experiment used by Gossett himself to illustrate the uses of the t-test. In experiments on highly 
standardised laboratory stocks reared under a highly standardised regimen, we may be entitled 
to dismiss this consideration ; but pairing of observations on different individuals will rarely 
justify the use of a significance test based on the assumption that the d-scores are samples from 
a homogeneous normal universe. If we investigate the effect of an increased milk ration on the 
growth of children, we may eliminate certain fairly obvious sources of error by selecting as one 
member of a pair for treatment an individual of the same sex, age and build as the untreated 
member ; but we have no sufficient reason for assuming that variation of response to treatment 
is uniform from pair to pair. 

The fact that we can rarely invoke with propriety- the postulate of equal variance, unless the 
only residual source of variation is instrumental, reinforces the relevance of the second question 
raised ebove. In many types of experimentation, the range of sampling variation is small and 
the number of score values consistent therewith is also small. In short, the situations in which 
we can most confidently condone the assumptions implicit in the t-test for paired score differences 
are often such as to dictate justifiable caution w.r.t. the assumption of normality. Fortunately, 
our excursion into the burette universe of 14.05 has a lesson to offer in this context, viz., that 
the mean difference of repeated pairs of observations on the material tends to normality fairly 
rapidly even if the number of different values consistent with good workmanship is small. 

Since we should therefore exercise restraint in using the t-test for paired differences, it is 
important to realise that the c-test prescribed in 7.07 of Vol. I and referred to in 16.05 of this 
chapter is for reasons set forth more fully in 14.08 free from the objection that it implicitly postu- 
lates equality of variance. Subject to the recognition that it shares with any approximate c-test 
(including the sc-called Chi-Square Test for the 2 x 2 contingency table) a measure of uncertainty 
arising from the use of an estimate of the variance of the mean, it is equally applicable to any 
situation in which we can eliminate known sources of variation whether by pairing observations 
on the same individual or by pairing like individuals, as when one compensates for unequal 
overhead illumination or local temperature differences due to draughts by pairing off in a green- 
house pots of plants of the same stock subjected to different treatments. 


16.08 THE CORRELATION RATIO 


Any significance test for homogeneity derives its rationale from the distribution of inde- 
pendent estimates of the variance (0?) of the unit sample score distribution of the putative 
common universe. For a set-up involving one criterion of classification associated with the 
columns, our statistics are respectively referable to the variance of the column-means and of the 
mean variance of scores within the columns. For a 2-way lay-out we employ one estimate 
referable either to the column-means or to the row-means and a second statistic (s?) referable to 
the difference between the total variance (V „) and the sum of these two. At first sight it would 
seem more reasonable to choose an estimate based on the total variance as our second statistic in 
either case, i.e. s? defined in 16.04 as 
i rc 


O o O 
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From this expression we can derive the Chi-Square variate with (rc — 1) degrees of freedom : 


(rc — 1)s? (ii) 


g? 
Thus z defined as below defines the ratio of 2 Chi-Square variates, involving the ratio of 2 
unbiased estimates of o? : 
2 eases i ; ; i ; -= 
(rc — 1)s? 
If we make the appropriate orthogonal transformation we see that this is mot the ratio of 2 


independent Chi-Square variates. ‘Thus we may express z in accordance with the procedure of 
16.04 as 


ie ae ee, 
cease 2 2 2 (iv) 
Us ee ar 
= se... 2 
a a 2 2 2 2 2)" 
(u2 — Ug Pyne Us) + (Uti t+ Ute aes Ure) 


In the foregoing expression the denominator contains all the square u-scores present in the 
numerator, but we can express it as a function of the ratio of two independent sets of u-scores 


if we divide the numerator and denominator by (u? + u? . . . u?), so that 
s 2 3 c 
1 1 
Z — E e e e e (v) 
2 2 2 
a Uti tute ++, tea 

2 2 2 
Uo Us . a . Ue 


In (v) the numerator of x contains rc — (c + 1) + 1 = e(r — 1) terms like s$ of (xxxvii) in 16.04, 
if the columns contain an equal number (7) of rows. When our concern is with only one criterion 
of classification, we may write rc = n as in (xxxviii) of 16.04, so that c(r — 1) = (n — c). The 
denominator contains (c — 1) independent terms, so that x has the same distribution as 


(n — o)? 
(e 


We might arrive at this conclusion by using the tautology of the grid. By definition 


a TE a no ES n 
5 = — VOM) = VM) and $ = iV =a j” 
A as =V(M,) and — ae = V, = WM) + M(V.), 
E 2h A 1 
VM.) + MS M aa 


V(M,) 
In the above 


ace = M(V.), 


a MV.) ES (n T c)s3 e vi 
a) (e DE - . , A . (vi) 


IO 
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Thus x is the ratio of two independent Chi-Square variates. If we write 
a=(n—c) and b=(c— 1) 


for brevity, so that a and b are the d.f. of numerator and denominator, the p.d. equation of x is 
given by | 

y Ha — 2) e 

f(x) cues B(ka, 1b) ; (1 E ay ia +0) (vit) 


In 15.02 we have seen that the p.d. equation of z = (1 + x) is therefore 
#(6— 2) (] — gy (0-2) 
a catenins ae 
. B(za, 3b) 


Or more fully 
gHe— 21 oe gin —¢— 2) 


Fiz) = EE a 
MA E) 


| (viii) 


Thus the distribution of the ratio defined by z is a Type I variate within the framework of the 
implicit assumption that the column samples come from the same normal universe. 

Evidently the tabulation of the Type I variate z could give us no information which we cannot 
derive from the Type VI variate x if our only concern were to establish homogeneity w.r.t. the 
column criterion of classification. 'The interest of its distribution lies in the fact that we can 
express either correlation ratio of a bivariate distribution as a function of the same form. We 
have seen in Chapter 10 of Vol. I that it is always possible to lay-out a bivariate frequency grid 
as a grid of one or other set of score values, as in the numerical example of the accompanying 
table ; and we may analyse either set of scores w.r.t. one criterion of classification by putting in 
the same column scores which go with one and the same value of the alternative set. The 
number of rows in each column will not necessarily be the same; but we have seen (13.07) 
that this is immaterial, when our concern is with a single taxonomic criterion, here taken as 
that of the column heading. The 2 score-grids so constructed respectively set out the relevant 


data for the evaluation of 


V(M.s) V(M»a) | 
2 ab 9 ha : . 
=> and Mía = : : E 
: Vg a 
Frequency Grid 
A-scores 
0 1 2 3 Total No. Map 
0 1 2 1 0 4 1 


B-scores 1 


2 


Total No. 


Moa 


SIGNIFICANCE TESTS FOR ANALYSIS OF VARIANCE 709 


Score-Grid (A-scores) 


DN © 


0 
1 
2 
2 
3 


U Dha OO 
DN Do me mO 
NDN m= 


In the symbolism of this context we may define either ratio with appropriate alternative 
interpretation of the column criterion as 


ORM.) 
2 A i : ; ; i 0) 


In this expression V, is the total score distribution. If the grid items are A-scores, V, = V,; 
and M, being the mean A-score for a column defined by a fixed B-score value, M, = Vy. 
We then interpret the above 7%, in accordance with (ix). Conversely, the B-score-grid V, = Vy, 
M. = M,, and n? = nía: For either grid, the total number of entries is n = rc if the columns 


contain equal number of items, otherwise 
i=cC 


A 1. 


1=1 


In either case, we may put 


ES wt E rua, 
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In this expression n? is identical with z as heretofore defined, and its distribution is defined 
by (viii). From (xix) of 15.04 we see that the mean value of z = 7? is 


c— 1 
Pee i 
We can test the significance of a correlation ratio defined by (viii) on the assumption of 
independence, and hence of zero correlation, by recourse to the F-table in virtue of the identity 
n? l 


gi y? sees 
In virtue of the reciprocal property of the Type VI variate (Ex. 3 of 15.04), we have 
TAG — 8) 


p(u) = nl : T n = Ja + uji" D 


In this expression the appropriate F ratio is given by 


eit ecl 


moc e 


A €) Y” xi 
Te eee (xt) 


Numerical Example. From the score-grid for A-scores in the accompanying table we get 


(=3+(M.— MY J 
VM.) = E LED aa — BB + aol — BD? HeC — 2! = 0-218; 
i=1 
x 16 ? 
e dd 0-1944, 
Nab 0-1944 
_ = 0-2413 
l— n 0-8056 j : 
(We = 0 Nab 
= 137. 
CE 
In (v) of 16.07, (3 — 1) = a= 2 and (16 — 3) = b = 13. The F table gives 
a=X%b= 72 a=—26= 4 
5 per cent. level . a 3:88 3°74 
1 per cent. level . A ; 693 6-51 


Thus the observed F-ratio is below the 5 per cent. significance level. If we proceed in the same 
way for the B-scores : 
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2 r(M; — M,y? 
A AY = old Dl — D8 + HG = 0.1531; 


meat.) = > > 
i=1 
— EY? — 28 =. 3 EE no 
B20 — * + 50 — DURAS DAS — __ ase. 
16 
V(M.) 
= == T . a 
"ba = 7 0-1324 ; 
2 ; 3 
PO 106; 


1— 72 08676 


A TN 
F = TERET AO 
In this case F < 1. Since the F-table cites significance levels only for values exceeding 
expectation, we make use of the reciprocal property of the F-variate, 1.e. we ask what would 
be the expectation of getting a value as great as (0-6104) = 1-54. We must then reverse the 
degrees of freedom, by putting 16 — 4 = a = 12 and (4 — 1) =b=3. The F-table gives 
ab 3 
5 per cent. level ; 8-74 
1 per cent. level A 27:05 


Again the result shows no significant departure from zero correlation. 


CHAPTER V 


REGRESSION AND DISCRIMINATION 


17.00 REGRESSION IN REAL WORK 


So far we have gained our acquaintance with the concept of regression only in the domain of 
statistical models based on games of chance. We shall now consider it as a tool in the day’s 
work. ‘Though the statistical procedure subsumed under the term regression owes its name to 
Galton and its literal meaning to Galton’s erroneous views about inheritance involving many 
gene substitutions, it is essentially one which physicists have used under a different designation 
for more than a century as a curve-fitting device due to Gauss. It is not easy to evaluate its 
proper uses nor to recognise what pitfalls beset its applications unless we delve into this back- 
ground, as we shall now do. 

In the domain of statistical models, it suffices to define regression of the linear type in purely 
algebriac terms, wz. regression of the B-score on the A-score is linear if the B-score means 
respectively associated with successive equally spaced values of the A-score increase by equal 
increments. To say this is to say that they constitute an arithmetic series and as such would fall 
on the same straight line if plotted against the A-scores graphically. When discussing experi- 
mental data, the geometrical definition has certain advantages which will emerge in what follows. 

The Gaussian origins of regression have to do with the problem of determining agreed values 
of the definitive constants of a straight line law. Needless to say, a linear law is not a common 
type in the exact sciences ; but a suitable score transformation suffices to reduce a non-periodic 
physical law to the linear form. For instance, we can express Boyle’s law (pv = k) in the form 
p = kd by the substitution d = v~1, and plot mean values of d against fixed values of p to deter- 
mine the regression constant k. Thus the issue, as stated above, is of more general interest than 
would appear at first sight. It arises in a multiplicity of situations which demand the adoption 
of a universally accepted numerical value for a physical constant on the implicit assumption that 
there exists a true law of a type amenable to expression in linear form. 

It is of paramount importance to recognise the implications of the assumption last stated. 
It may be easier to get them into focus if we take the simplest possible example of a physical 
law as a type specimen. One of the few examples of a familiar physical law commonly stated 
in the linear form is the law of Hooke (ut tensio sic vis) connecting the length (l) of a spring with 
the load or tension (t) applied for values of the latter not too near the elastic limit. The law 
is linear if 

[=k.t+C. 
In textbooks we commonly meet it in the form which the above assumes, if we denote by /, the 
unstretched length when t = 0: 

A A 


So stated, (J — l) = s is the stretch, and we may more briefly express the law as s = kt. This 
is, however, an idealised statement of any laboratory situation. | 
In real life we do not expect that all our observations, however carefully made, will fall 
exactly on a straight line or other descriptive curve. Even if we can eliminate all extraneous 
sources of variation, e.g. condensation of moisture on the scale pan, we know that successive 
observations involving no change of the controlled variable—as when we use the same box of 
weights—will involve instrumental errors, e.g. discrepancies in successive readings of a vernier 
scale ; but we may have reason to believe that errors in this sense are not systematic or at least 
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that we can arrange matters so that they are not. This is to say that numerically equivalent 
positive and negative deviations about an assumed true value cancel one another in the long run, 
the mean value of successive observations referable to one and the same value of the variable 
under experimental control—in this case load—being therefore an unbiased estimate of the 
one deemed to be true in this sense. 

In so far as we can rightly make this assumption, sufficiently justified by experience in many 
laboratory situations, statement of the law is wholly explicit if we write M., as the mean of 
an indefinitely large number of s-scores referable to the same #-score, so that 


M,. qe k =F 
For a fixed set of t-values we may write E(t) = M, and E(M,.,) = Ms, so that 
M,.¿— M, = k(t — Mà) , i i ark) 


For a discrete distribution this expresses the belief that the universe or true mean value of s-scores 

associated with particular ¢-scores constitute an A.P. when arranged in accordance with corre- 

sponding equally spaced successive values of the ¢-scores, in which case certain tautologies 
established in 11-04 are necessarily true. In particular 
Cov (s, t) 

k = — i i ‘ ; A 

> Gi) 


Equations (i) and (ii) above define the properties of the universe of our observations. We 
shall later see (17.01 below) that the sample statistic calculated on the same basis as (ii) is in 
fact an unbiased estimate of the true value (k) of the physical constant. First, however, it is 
important to be clear about all the assumptions we have so far made. Our interpretation of a 
physical law exhibiting the dependence of a B-score on an A-score presupposes that 


(a) to each observation x, ., for a fixed value of the A-score corresponds a true value—the 
regression score x, , —from which the observation deviates by an error or e-score (€b.0) ; 


(b) the regression scores lie on a straight line each such score being identical with the 
corresponding mean B-score (M, .a) in the universe of all samples ; 


(c) the observed B-score (x,.,) for a fixed A-score thus consists of two additive com- 
ponents which we may express by the relations 


Mig =P ond Ra My. > ; (iii) 


(d) these components are statistically independent, the e-scores being distributed random- 
wise about zero mean ; 


(e) the distributions of the B-score for different fixed values of the A-score therefore differ 
solely in virtue of the fact that different values of x,. a = My. a fix the origin of the dis- 
tribution, whence V,, , is constant and the universe is homoscedastic in one dimension ; 


(f) since x,., is constant for a fixed value of the A-score, the corresponding variance of 
the B-score distribution is the variance of the e-scores, 1.e. 


Pee AE ee aa 


Within the sample, regression will not be exactly linear, but we can define a set of hypothetical 
regression scores (x,. as) having this property, as in 11.04, and a sample statistic k, as the slope 
of the line on which they lie. The use of a sample statistic k, based on corresponding sample 
values of s and £ to estimate & in (ii), as illustrated in the numerical example at the beginning of 
18.02, is what physicists call line-fitting by the method of least squares. We may regard the latter 
as a procedure for obtaining an estimate with a confidence interval as small as possible at a 
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particular confidence level ; but a general proof that such an estimate is unbiased in the sense 
that the long-run mean result of applying the prescription is the true value of k is laborious. 
Accordingly, we shall defer its consideration till we have determined the unbiased estimate of 
the regression coefficient by another method. It will then take its place as a particular illustration 
of the principle of minimal variance, when we ask whether the unbiased estimate under con- 
sideration is also an efficient one. 

The least square method derives its rationale in part from the assumption of a normally 
distributed set of B-score values and a more general approach to the issue stated at the beginning 
of the last paragraph is easy to visualise. In determining an unbiased estimate of k, we shall 
make no assumptions about the law of the distribution of errors, any such assumption being 
indeed highly questionable in many real situations. The reader who has consulted treatises 
which present the use of regression analysis against the background of the so-called bivariate 
normal universe will notice that certain assumptions commonly presented as deductions from its 
properties are logically implicit in the concept of a physical law, in particular for the reason 
stated in (e) above, homoscedasticity, i.e. equal variance of the B-score distribution for different 
values of the A-score. 

Now this property is one-dimensional. Our formulation of a law relating the mean B-score 
to the A-score implies nothing about the distribution of the A-scores for fixed values of the 
B-score in the context of the experiment. The range of A-score values, and the number of each, 
depends on the way the investigator carries out the experiment ; and we are entitled to regard 
any one experiment as a sample of an indefinitely large number of experiments carried out in the 
same way. Commonly, the fixed A-scores, or independent variable in the Cartesian sense, 
represent the one easiest to control; but we may often reverse the procedure. For instance, 
we may fix the vernier of a micrometer to read when the stretch of a spring attains a certain limit 
and measure repeatedly what we have to add to the scale pan to achieve the result. If so, 
we make t in the relation s = kt our dependent variable and must reinterpret k accordingly. 

We shall refer again to this duality of regression in 18.03 below. Here it is admissible to 
forestall an unnecessary difficulty which the reader may have experienced if already acquainted 
with the concept of a bivariate normal universe. Within the framework of the dubious assumption 
that errors involved in determining the B-score for a fixed A-score and wice versa are both 
distributed normally, the solid model of a universe which is normal in both the B-dimension and 
the A-dimension of the grid corresponds to reality only in the sense that it embodies the possi- 
bility of carrying out an experiment in one of two ways. The conduct of any actual experiment 
is 2pso facto unique in the sense that it refers to one set of fixed scores, the distribution of which 
depends on the observer’s choice. In so far as our concern is with the uncertainties arising 
from error in the procedure we do in fact adopt, we cannot therefore conceive of the experiment 
as a sample from a universe in which the score distribution is normal in both dimensions. If 
we choose to take equal numbers of measurements (B-scores for each of a particular set of 
A-scores), the assumption of normally distributed errors implies that our sampling universe is 
normal in the B-dimension and rectangular in the A-dimension, and if we chose to describe it 
in the language of 3-dimensional geometry the frequency surface is less like the sugar loaf of 
the bivariate normal universe than the outside of a Nissen hut. 


17.01 PRINCIPLE OF THE FIXED-A SET 


The models of 12.01-12.09 define the unit sampling distribution of different universes of corre- 
lation. We shall now attempt to break down the distribution for samples of more than one 
item from a bivariate universe so conceived in accordance with the considerations advanced 


in 12,00. 
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2 FACE PACK - UNIT SAMPLE DISTRIBUTION 


p q r S t | 
SO geen Pert jo ajeje aja Le UNIT GRIDS 
meets) = stele fas ee El, aan | 


Fic. 118. For explanation see text. 


Our model universe, from which we sample with replacement in conformity with the 
assumption that it is both discrete and infinite, will be a pack of cards made by gluing pairs of 
cards face upwards, so that one face bears 1, 2 or 3 hearts (A-score) and the other bears 1, 2 
or 3 spades (B-score) as shown in Fig. 118. The universe (unit sample) bivariate distribution is 


LE N 1.1 | 2.2 2.3 3.3 
proportions $ q r S t 


That we sample with replacement for the reason set forth in 12.00 means that the distribution 
of 2-fold samples is deducible by the chessboard procedure (Fig. 119) of which the equivalent 
definitive multinomial is (p + q + r + s + t)?. By successive application of the chessboard 
device, we can visualise the extraction of n-fold samples in accordance with terms of the expansion 
of (prq+r+s+0". Thus the probability of getting one paired score of (1.1) and 2 
paired scores of (2.2) in a sample of 3 is 3pr?, and the 3-fold sample grid is then 


10* 


716 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


DERIVATION OF A 2-FOLD SAMPLE DISTRIBUTION 
FROM THE 2 FACE CARD PACK UNIVERSE 


HH Een Ehi 


Fic. 119. 
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1 2 3 
| : 
2 0 2 0 Cov (Xa, X) = $ = Va = Ve 
3 0 0 0 fa = A; 


We may classify our samples of 2, 3, etc., in fixed-A sets, i.e. sets of samples having the same 
border A-score distribution. For 2-fold samples (Fig. 120) of our 2-face card-pack model of 
5 paired score classes, there are 6 such sets, labelled Asoo (Xa = 1 twice), Ayo, (Xa = 1 once and 
*, = 3 once), etc. Fig. 121 shows all possible samples of 3 classified in the 10 fixed-A sets, Agog, 
Aso Aso) etc. For the theory of sampling from a bivariate universe, a very important result 
is an immediate consequence of the fact that the universe of the unit sample distribution is 
infinite, i.e. that we may regard the replacement condition as valid. The pooled mean value 
of the B-score associated with any A-score present in the subset is the same for any fixed-A set 
as for the unit sample distribution referable to all permissible values of A. This follows from 
the way in which we weight the samples in conformity with the principle of equipartition of 
opportunity implicit in the chessboard procedure; but the student may find it instructive to 
check the rule by recourse to the 2-fold or 3-fold sample distribution of our 2-face card pack 
model as below. | 

For the unit sample distribution we derive the mean B-score (M, . a) associated with x, = 2 
as follows : 


_ gl) +72) +98) _ 9 +27 +38 


My.» 
Frys g--+-r—+-s 


(i) 


For the fixed-A set Ayo we have 6 different types of sample structure of which one with frequency 
6pgr is 


Within this sample the value of M,.. is 4(1) + 4(2) = 2. The sample itself consists of equal 
numbers of paired scores (1.1), (2.1), (2.2), with a total frequency of 6pgr as stated, whence 
the frequency of paired scores having the relevant values x, = 2 and x, = 1 or 2 is 4pgr, 
and this, divided by the total of such, must be our sample weight, when we pool the values 
of M,.. for the whole set. We then summarise the computation of M,.a for the entire set 
A2 as follows : | 
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FIXED A-SET 
HISTOGRAM FOR 2 FOLD SAMPLE 
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SAMPLE 


FIXED A-SET 
HISTOGRAM FOR 3-FOLD 
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Ao21 Aor Aoo3 


=3(q+r+s)t? stè 
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Sample Sample Total 
Structure Frequency M, .'s Frequency of (2. x;) 

TEA 6pqr 3 4pqr 
(1.1) . (2.1). (2.3) 6pgs 2 Apgs 
(1,1). (2.2):; 2.3) 6Gprs 3 Aprs 
(1.1). (2.1). (2.1) 3pq? ] 2pq? 
(1). (2.2) (22) opr? 2 2pr? 
(1.1) . (2.3). (2.3) 3ps? 3 2ps? 


The total of the last column in the above is 
2p(s? + r2 + q? + 2rs + 2gs + 2qr) = 2p(s + r + q)’. 
The total of the products of the last two columns is 
Spgs + 10prs + 6pgr + 2pq? + 4pr? + 6ps? = 2p(4qs + 5rs + 3qr + q? + 2r? + 3s?) 
= 2p(s + r + 9) (q + 2r + 3s). 
Whence the mean value of M,., for the entire set Ajo, is 


2p(s + r+q\(q+2r+3s) _ (q + 2r + 35) 

2p(s + r + q)? s+r+q 
This identity arises from a more general relation inherent in our visualisation of sampling 
in a bivariate universe as equivalent to sampling independently with replacement from a universe 
of unit grids in accordance with the chessboard principle. The principle of the fixed set, as we 
may call it, is that the distribution of B-scores for a particular value of the A-score is the same 
in any fixed-A set and hence in the universe itself. One illustration will suffice to make this 

clear. From Fig. 121 we see that the B-scores for x, = 2 in the set Apo, is 


Xy.9 Weighted frequency 
1 (0 + 2gst + 2grt + 0 + 0 + 2g*) + 2t(r + q + 5) =0- 
2 (2rst +0 + 2grt +0 + 2r2 + 0) + 2t(r + q +5)=7. 
3 (2rst + 2gst + 0 + 25% + 0 + 0) + 2t(r +q + $) =s. 
In the notation of our club-sandwich universe of all samples the principle of the fixed set is the 
formal identity M,., = M,..,; and its derivation follows directly from the initial assumptions 
and definitions of symbols. Our hypothesis is that an observed B-score (x,.,) has two additive 


components : the true value (M,. 4), being the long-run mean of repeated observations referable 
to a fixed A-score, and a random error (es. ac) independent thereof, so that 


Xb.a I A A de and Mia = Ev X5a) = Mia EM re 


If we here conceive each stratum of our stratified universe to accommodate a sufficiently large 
number of experiments referable to the same set of A-scores, we presume that negative and 
positive errors cancel, i.e. | 
E. e Ele. 50) =O = EM. : AR 

e" Moa — E (Ms. acs) E Mys -f EAM, . ae) me M5. q» 


The model of Fig. 118 is consistent with linear regression if we put q = s in the u.s.d. ; but it 
does not embody 2 essential properties of the Gaussian bivariate universe as defined in 17.00, 
viz. : (a) homoscedasticity or equal variance of B-scores for any fixed value of the A-score ; (b) 
distribution of errors (i.e. deviations from column means) about zero mean for each fixed value 
of the A-score. To satisfy either condition we must have at least 2 non-zero cell entries in each 
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column of the universe grid, since the column variance will be zero if there is only one non-zero 
entry therein. ‘The arrangement below shows how it is possible to satisfy both conditions, while 
still leaving two corner cells empty : 


1 2 3 
O es E Be (2p + 8q + 2t) =1. 
a p 6q t 2 (p + t) — 2(t — p)? 
HELIO MR A SC ee 
3 E : $+(6 +1) — At — p) 
My. . 1-5 2-0 2-5 
Voie 0-25 0:25 0-25 


If the bivariate universe is homoscedastic as for the grid last shown, the principle of the fixed- 
A set carries an important consequence. Since the distribution of B-scores for a particular value 
of A is the same in any fixed-A set, the variance of the B-scores for a particular value of A is 
the same in each fixed-A set and hence the same as in the universe. Hence each fixed-A set is also 
homoscedastic if the universe itself is homoscedastic both with respect to B-scores and to their error 
components. For a particular value of the A-score the error variance is, of course, equivalent to 
the B-score variance, since the two distributions differ with respect to the mean only. ‘Thus the 
error variance of a fixed-A set as a whole, or for a particular A-score of the fixed-A set is that 
(a?) of the whole universe. 

The principle of the fixed-A set invites us to visualise the complete distribution of r-fold 
samples from a bivariate universe by conceiving it as a 3-dimensional stratified grid. Each 
layer specifies a particular type of sample structure in conformity with the assumption that the 
sample consists of any combination of r paired scores not necessarily different. Each stratum 
consists of all layers of a fixed-A set and of no others. Within the stratum the number of identical 
layers tallies with the relative frequencies of the corresponding types of sample. Within the 
universal grid consisting of strata corresponding to every possible fixed-4 set, the strata repeat 
themselves, the numbers of strata of one or other type being proportional to the frequency of 
the corresponding fixed-A set in the sample distribution. Having weighted our layers and strata 
in this way, it follows that each paired score occurs in the universal grid in the same proportion 
as in the unit sample distribution, whence any parameter of the whole grid is equivalent to the 
corresponding parameter of the u.s.d. 

Having so conceived the universe of the r-fold sample, we may label parameters referred 
to below with due regard to the conclusions stated above, viz.: (a) that the border A-score 
distribution of all layers in a stratum is the same; (b) that the mean B-score for a particular 
value of the A-score is the same for the whole stratum as for the universe. 


Stratum 
Layer (sample) (fixed-A set) Universe 

Mean B-score . i. See ae M, 
Mean for fixed-A set . My aes Mo E Mini My Ms a: 
Variance of B-score 

distribution . 5 PE Visa oF 
Mean A-score . Mi ee er. Pan M, 
Variance of A-score 

distribution . ae, ae. | fee O Sree a 
Covariance . : A a D Ae E TOA Cov (%,, xp) 


Regression Coefficient . Rya.cs ae Roa 
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In this set-up we can visualise the layers as frequency grids of a columns and b rows or as 
score grids of 2 columns, one for A-scores and one for B-scores paired row by row’ as in a com- 
puting schema (see schemata at the end of 11.01). For the present, we shall adopt the former 
convention, defining a sequence of operations in accordance with the conventions of 11.06 as 
follows : 


Within For all values of A-score For fixed values of A-score 
Column . ; T eres OESE y A T, 
Layer : | ae Oe ier |e Soe 
Stratum . e E Bi a ee | ee Ge eae 

ae ie oe ee 
Whole Grid . ; E, E- be R E A - PER EO 


jer ee A PR 


The order of operations is mot interchangeable, except as indicated. Otherwise, the symbol 
E, would be ambiguous, if we employed it in operations which we might otherwise distinguish 
as E, referable to all A-scores and E, referable to a sample or a fixed-A set. Without ambiguity 
we can drop the subscripts c and s in the symbol x,., for the B-score associated with a fixed 
A-score or in the error components defined below: but we must distinguish between 
Xa = (Xa — Ma) and Xa. s = (Xa — Ma. s) in virtue of the fact that the mean sample A-score 

a: es = Ma. a being that of the fixed-A set, is not the same as the universe mean 
M, =E(M,. s) With this convention, and with due regard to the fact that E, in the prescribed 
order is a within-layer or within-stratum operation, 


ELA...) = Ed m Ma.) = Ma. ee ME) 

EX.) =E Ma) = Ma. .— Ma . i ; : o 

In conceiving a universe stratified in this way, our end in view is to explore the consequences 

of a linear law relating the B-score means to the A-scores, i.e. linear regression of the B-score 
on the A-score. In doing so, we can take advantage of the assumptions inherent in the formula- 
tion of such a law, as stated in 17.00. That is to say, our B-scores have 2 additive independent 
components, the distribution of the error component about zero mean being the same for every 


A-score in any sub-universe specified by a fixed-A set. When there is perfect linear regression 
in the stratified universe so conceived, we imply that 


M; am My = kr: Xu and Ms. (= ee . . (1v) 


If we express a physical law in linear form, /,., in the above is the true value from which an 
observed B-score (x,.,) deviates on account of instrumental error or imperfect control. Accord- 
ingly, we define our errors by the relations 


€. = o.a — Mri. a = o.a — Mo — Koala — Ma) . e e (v) 
Xo. a = €n. a + Mo. a = ev. a + Mo + Koala — Ma) . : e (vi) 
In conformity with this notation, we denote the mean sample error for a particular A-score as 
Mo. ace = Eb. al€o. a) = Mo. acs — Roa. Xa — Mo; 
M,..as = Mo. aes) = My. as — Roa. Xa — My = (Mr. a — Mi) — Rya. Xa 
Mia m oad M aam Pi Se . (vi) 
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Whence we may also write 
VAM. aia) oe EAM. ai aa)” a ES, acs) a V(M,. acs) . . (viii) 


That M,.as = 0 is implicit in the assumption that errors are distributed with zero mean 
independently of the A-scores and hence of the true score values (M,.,). Whence, of course, 
follows the identity 


Morro i i i - BR) 

From the definition of M,.,,, above we might also write | 
Mi = El Ml My. — Ma) — Ma . . ee, 

Lid Mo) SM — IM, RM — Ma) a 

Mos... = ee. and Mio... Me = koal Ma. s Ma) A 


*** Alternatively we have : 
E (M, T =0 = E (M, ; 44) Te Roa . E Ps M, 
= (M, ees, M,) Si kya Ma T M.). 


From (xi) we have 
Mo Ma Mo Ms Má os 


VR MA, 64) > i : uu) 
If the sample contains p paired scores, the expression on the left of (xiii) is the variance of the complete 


distribution of the mean of a p-fold sample of errors, being the same for every sub-universe referable 
to a fixed-A set. If therefore o2 is the variance of the error u.s.d. 


V Mo. os) == = Vo. es) AE OS << 


Since our assumption in conceiving a law as a description of such a universe is that the errors, 
being independent of the true value, have the same random-wise distribution about zero mean for every 
fixed value of A, we may write V,, ,=0%= M(V,, ,). This relation makes explicit the implication 
that the universe is homoscedastic in the B-dimension, since 


Vi E Mi RV os 
T A 


By reshuffling our grid cells, so that all B-scores referable to a fixed A-score constitute a column of a 
2-dimensional lay-out, we can make explicit the tautology 


oF = MV, . él ES VM, . 4 = 0% a ViM, . di 


When regression is linear 
- ae ye eae 2 
V(M, : a) = Tab Uy leon Ria - Oa» 


E A 2 > IS 2 2 
. 05 = o; + Tap - 0% = OG + Rig. Oa 


. (1 — 1) o§ = 0% 


= 0% — ke, . 02. 
We have thus split our total B-score variance into 2 additive components, 
(i) that of the hypothetical true values we seek to estimate : 
FINE, ee a oe ; . (xvi) 
(11) that of the errors we make in attempting to do so : 
o? = (1 — 12)0% ; À i : (NA) 


kxk We shall require (xiii)-(xix) at a later stage; but the reader may prefer to go straight to the derivation of 
(xx) (xxvi) and return to the section between asterisks thereafter. 
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Within a stratum the error variance for a single value of the A-score is 
Vo a = E By ks ce — Men)” 
In this expression, as in the derivation of (vi): 
e. M. o = (Wo. a m Boa A My) My. on Ra Aa 0) 
= (8y. a — My. as), 
Via a ; : ' i ; y $ . (xviii) 


By a 2-dimensional lay-out of the errors referable to fixed A-score values within the stratum, we 
derive the tautology 
Vos = MV, er VM, . aa)s 


In this expression, V(M, . as) = 0, since Me . ys = 0 from (vii) above, so that 
V. E er M(V. A ai: 


Whence from (xviii) | 
Vo.s= MV, a 


Now the distribution of errors is the same for all sub-universes, being independent of the A-scores, 
whence V,,, = 0%. 


MV, . = ee oe z Ek . as) $ i š . (xix) 


*** Within a stratum or within a sample (Jayer) we may postulate for algebraic convenience 
hypothetical regression scores which conform to the tautologies specified in 11.04, vaz. : 


Xr. as — My. ¿+ Pia. Bae. oe $ ma A Mo... + Roa. Pre Sa s . (xx) 
In these expressions the definitions of kpa. s and kpa. cs are implicit, as indicated in 11.04, wiz. : 


Cov (Mn: sy Xb. a Cov EE es) Xp. eS 


= Sc oe (xxi) 
a. $ Vo sl a. CS E b 
In virtue of the identity of A-score distributions in each layer of the stratum, Va. es = Va .s 
and 
Coni. is R 
ke ss = ( a.cs) “bd a (xxii) 


Va. s 


If we label our coefficients appropriately, we may adapt a tautology of correlation set forth in 
11.04 without danger of confusion, viz. : 


2 Bi ee os 
Tao Th Pet i 


2 ci 2 eras > 133 
Pos gs Ysa hie PA and See A ce eee S s (xxiii) 


By making use of the identities V4. es = Va. s and M5. as = My.q we can establish the 
two following conclusions : 


(i) Within the stratum, the mean value of ka. 18 Rea. a 1e fora Gred- A set kiso 18 
the unbiased estimate of ka.s This follows at once from (xxii) of 11.08, which 
exhibits the covariance of the pooled sample paired scores as the mean value of the 
covariances of those of the sub-sample when the border A-score distribution is the 
same for all sub-samples as we here predicate of the stratum. Within the stratum we 
may thus write 


E. Cov (Me. es) VD. a = Cov ie ss Ñb. ab 
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Within the stratum V,.., = Va. , is constant from layer to layer, so that 
Cov Mic Ma me y ome Cov trae TEP 
E Og E e ee ee 

Fa . CS ies Va -S i 


COO (Ne ase) 
Yess 


(ii) The value of ky,., is the same in all strata, i.e. for the complete sampling distribution 
of any fixed-A set. By (xii) above (My.a,— Me.s) = Roa-Xa.s. Whence from 
(xi) of 11.02: 


Cov EE Msc) == E, . EM. as md May. s) a FF Mia) or Rya deg ited CA ae Mas 


. Cov E A Sia — Roa. E 8) 


ae (xxiv) 


ove E (Roa. ag) ae 


a Conan) 


aE o ae me PAR j : : ; (xxv) 
This result permits us to interpret (xxiv) in the form 
ER aa) Ro .  (xxvi) 


This means that the mean of all samples of k,,.,, in the fixed-A set is the true regression 
coefficient. We express this by saying that the sample statistic which is an unbiased estimate 
of the true regression coefficient (kpa) of the parent universe is the ratio of the sample covariance 
to the sample variance of the A-score distribution. 


If we assume what is rarely true (p. 714) of A-score sampling in experimental work, i.e. that our 
sample of A-scores is random, we can now derive the unbiased estimate of the true covari- 
ance. For random samples of p paired scores we may write 


ices = Ee.) == of, 


From (xxv) above 


Cov ce Me ks cs = Fie Pacts 


won | 
<b, Ce ©, (eee, b= 5 hoa 


a (p es H Cov EN Xp) 
LL M, 

(p a 1) Cov (eo Xp) 

ERES TES At RAE 


In sampling at random w.r.t. the A-score, the unbiased estimate of the covariance is therefore 


e'e E; Cov (eo s Yo, dl 


E Bos. Cov ee Nec es) z 


(xxvii) 


P Cov (x, cs Xb. cs) a (Xaj ms Mo. AS (Xoj Er M, . cs) 

(Pp—1) j=1 (Ph) 
In what follows, we shall need to derive an estimate of the B-score variance of the stratum, 
which we may define by recourse to a 2-dimensional rearrangement of the cells in columns 


referable to a fixed A-score as 
Vo T FE M(V, . a) = V(M, . ds) 


aar M(V 4. as) $ Koa a Vas 
oa M(V 5 .as) $ Rea iF as 
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Hence from (xix) above 
Vue, Ree Ss. : ; . (xxviii) 
If we re-arrange the cells of the stratum, so that all B-scores of the same layer lie in one column 
of a 2-dimensional grid, 
te = M(V,. és) at V(M,. 03) = ELV. z3) z V(M,. oa) 


Whence from (xxviii) and (xiv), we derive 
2 


e+ Ka Va. = ELV. 0) +2, 
= EVs. ea) LD + Hig Vans Sete ee, E 


It is implicit in the fundamental principle of the fixed-A set (M, . as = Mz. a) that regression 
is strictly linear within the sub-universe, if also strictly linear in the stratified universe as a whole, 
and the left hand side of (xii) makes this explicit, viz. : 


Mus Mie = Regie 


Accordingly, we can alternatively define our errors in terms of deviations from the stratum 
regression line or from that of the universe as a whole in either of two ways : 


€. = Xp. a — My. a = Loa — Koala — Ma) — My = kysa — RoeXa — My; 
Er. a = A. a — My. as = Noa — RoalX%a — My. s) — Mo. cm Yo. — RoaXa. s— My. (xxx) 
These alternatives are consistent, since they imply the relation on the right of (xii), namely 
Mo. .— Mp = RaMa. . — Ma) . 


17.02 COMPUTATION FOR THE REGRESSION EQUATION 


In 17.00 we have defined the practical problem of determining the best value of the physical 
constant of a law expressible in linear form as that of estimating the true value of a regression 
constant in a universe of which we can predicate exact linear regression. We have now seen 
that the sample value which is in fact such an unbiased estimate is 


Cov e: css VD. e 


Riace = cia ao . . . ~ z : (i) 


To say this does not mean that the sample value M, . acs of the mean B-score for a fixed A-score 
will fall exactly on the line whose slope is ksa. es. Our confidence in the law merely expresses 
the anticipation that observed values of the B-score means for fixed A-scores in a particular 
experiment will cluster closely around the line of slope kya. cs passing through the sample mean 
B-scores (M, . es) and A-scores (Ma. es). 

To keep our feet on the ground we may here pause to consider the use of (i) to determine 
the elastic coefficient (R¿¿) of Hooke’s law cited illustratively in 17.00 above. 

Example. 'The data of a class experiment are as below. 


Load (grams) Mean stretch (mm.) 
Vs 
0-4 
1-1 
1-4 
2l 


Ae Gh = & 
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To find the elastic constant we proceed as follows : 
M,=X1+2+3+4) =23; 
M,= K0:4 + 1:1 + 1:4 + 2-1) = 1:25; 
V, = K1 + 4 + 9 + 16) — (2-5)? = 1-25; 
V,= H0:4? + 1:12 + 1:42 + 2-1?) — (1:25)? = 0-3725; 


om, y = MOD ERED +204) +420 _ (9540.5) — 067 
0-675 
ao 1-25 = 0:54. 


In accordance with the notation we have used elsewhere (11.04 and 17.01), we may now 
formulate the regression equation which expresses the relation between the load (x,) and the 
hypothetical regression score (x,.;) which is our estimate of the corresponding stretch, 


Xp. — M, = kax — Mi) or 2. p= Re Me +C . . sza) 
The value of the constant definitive of the origin in (ii) is 
C= M, aa ka Me 
In this case 
C = 1:25 — 0-54(2-5) = 0-10. 


The observed values and so-called predicted (i.e. assigned) values, i.e. values calculated in 
accordance with (i) and (ii) in agreement with what physicists call the least square method 
would thus be 


Observed 0-4 Fl 1-4 2°1 
Predicted 0-44 0-98 1-52 2-06 
In this example the variance of the regression score distribution is 
a E (0-675)? — . 
koe 125° = 0-3644. 
As a fraction of the variance of the distribution of stretch scores this is 
0-3644 
ae 


We may thus say that a linear law here accounts for 98 per cent. of the variance of the stretch 
score distribution. 


In the notation of 17.01 the regression equation expressing the so-called predicted value 
(X,.q) of a B-score for an A-score is 


Rie POR gees Cs Ming ne epee ce . . ° (iii) 


It is important to distinguish the above from the exact relation connecting the observed 
B-score (x,) with the A-score. The appropriate equation which expresses it involves the true 
value of the regression coefficient and an error term, viz. : 


Xy = hog %tCt+s«,; C = My. s — Roa . Ma A A ; (iv) 
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For speedy computation of k,,.., we may proceed as follows for p paired values : 


j=p rapo 
A EA (v) 
gut gl 
je yp > 
x%.¡=25=DPp.M,; y j = Sop (vi) 
j=1 j=1 
a a ee 
` Xa. ji Xo.j S Sab + (vii) 
jai 
In this notation 
$ $$ 
Cov (Xas a 
2 
Ss eae 
EAT AS 
P p* 


Whence in (111) above 
P «Sab — Sa- So 


(viii) 
P -Saa — Si 


Eos = 


1 
C= 5 So — Boa. om + $0) E) 


The computation sheet will thus require five columns : 


Ma Xo x” xe ae n | 
Totals Se O Faw Sop Sab 


17.03 UNBIASED ESTIMATES OF REGRESSION PARAMETERS 


Having established the conclusion that the ratio of the sample covariance of the bivariate dis- 
tribution to the sample variance of the A-score distribution is an unbiased estimate of the slope 
constant (regression coefficient) if the mean B-score of the u.s.d. is a linear function of the A-score, 
we have as yet no criterion of the sampling error of the regression coefficient nor of the sampling 
error of a B-score estimate based on its sample value ; and we have still to justifiy our confidence 
that the true law of the universe is linear when a sample of paired scores furnishes us with the 
only available precise information about its structure. 

Before attempting to answer the questions implicit in the last two sentences, we may with 
advantage retrace our steps to the significance test for the correlation ratio in 16.08. This test 
purports to answer the question : have we good reason for believing that there is a law asserting 
the dependence of the B-score on the A-score ? When they take the next step by asking whether 
the form of the law is linear, exponents of Fisher's test procedures follow the same path 
inasmuch as they seek to define what different estimates of the variance of the B-score distribu- 
tion or that of its error components must be consistent, if the null hypothesis is correct. 

Since the significance tests which we shall now examine lean heavily on those developed in 
Chapter 16 in connexion with the Analysis of Variance, it is helpful to view the sample structure 
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as a score-grid on all fours with the customary schema of 2 columns and p rows for p paired score 
values. Accordingly, we recall the illustration (p. 431) at the end of 11.01. Below is a 
symmetrical frequency grid exhibiting 16 paired scores in the range 1 to 4. We shall also lay it 
out as a score-grid (‘Table 1). | 


A-score 
1 2 3 4 Total Sum Mean 

i 1 1 2 3 2 

~ 1 3 2 6 13 14 

3 2 3 1 6 17 17 

B-score SE PREECE A IA, AN RT Seen ee eee AA 

4 1 1 2 7 21 
Total 2 6 6 2 16 40 5 
Sum 3 13 17 7 40 
Mean 2 13 27 22 5 


The alternative lay-out calls for a modification of our notation exhibited in the accompanying 
score-grid on the assumption that our concern is with the regression of the B-score on the A-score. 
We then need to distinguish one A-score (a,) from another in virtue of its numerical value only, 
indicated by the rank subscript 7, but we need also to distinguish B-scores (b;.;) associated 
with the same value of the A-score and label them with a double subscript accordingly. 
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TABLE 1 


For fixed value of A-score 
A-score (a;) B-score (b;.;) ` 


Total of B-scores No. of B-scores 
(Tisi) (Ds) 
a =1 bi p= j=2 
= bj. =T,.1= pi =2 
a; = 1 ee E j=l 
di =2 b: ys 1 
a, =2 bo. = 2 
a= 2 Do S | j= 
qx _AAaA«. A A  —  — —_— > bpa = A E $ra 
a ee b, e ==2 j= 
di =4 Deza = 3 
a = 2 De sm 
a, = 3 bs S =a | 
a= 3 Ds eo, j= 6 
b;. = Tp. = 17 bs = 6 
aig = 3 bg 8 jg=1 
a, = 3 by. = 9 
a =3 he. = 4 
a,=4 a .4= 3 j= 2 
E a ERA Dr = 2 
ay =4 ba. =24 j= 1 | 
i= 4 pS ql jei 
Ta = 2, Piai > oD b= D> rn p= > p=16 
$. 1 i=1 j=1 ¿=1 an 1 
= 40 
da Dy 
Ma = ig Mi = 36 A ae ee ey See 


In the accompanying schema we introduce no conventions to distinguish sample stratum 
(fixed-A set) or universe (whole 3-dimensional grid) parameters. Where necessary we can do 
this as in 17.01, e.g. for mean B-scores associated with the fixed A-score a; we may write M, .;,,; 
M,.:s, M,.; For the present, this need not concern us. It will suffice to write 


| oe ¡sn 
Ma = - ili ; M PE Bi. 5; 
A 5 Pi Zz 
pros j=p, 1 ize 
bi.: = M, = Pi. My. 23 
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J= p; 
o e p= Zoen — Ma); 


Pi j=l Pi j=1 
y E Sook PERS a j tc 
MFE; j=- = DV eee E Mi MISS = = p({M,.; — M)? 
i= 4=1 j= i= 
= V, — V(M;. 1); 
Gi 
Cov (a; b;.;) = a — M.(M.: — M,). 
i=1 


When we employ this notation henceforth, we shall anite (a; — M,) = A,, so that 


SNA A 
put {=} 


Cov (a; b;.;) = A ME MO OA 


L 
Pa 


In what follows, we shall need to recall an important property of covariances, viz. that 


i=c M, tae 
Cov (a; b;. ae or, > Pp A 
P i= Pp 4=1 
1 i=0C : eae 
== 0. . i ; i ; » (UI) 
Piet 
Within the fixed-A set, A;. , has the meaning (a; — M,.,). We may therefore write 
is Pee ‘= pi Aiar Mo. is 
k =< —  «-—— d k = SR a? 
ba. cs 2 b. Via an ba-s = p Y. ; b 
O Ma as Roa)P . E ES > DA AM 5 tes ras My. is) 3 3 (iv) 
i=1 


Whence from (vii) of 17.01 
as Rue )P Va. s = E ee a. s° M,. ics é ° y (v) 


So long as we restrict our attention to the fixed-A set, we view the sampling process through 
the spectacles of Churchill Eisenhart's Model I in 13.04. That is to say, we conceive our sample 
as one of an endless repetition of experiments done in the same way. In the idiom of 17.01, our 
bivariate universe is thus a stratum with all possible values of the B-score distribution for a 
fixed-A set ; and the student must therefore lay aside preoccupations suggested by any previous 
acquaintance with the so-called bivariate normal universe. With more or less propriety, we may 
assume an approximately normal distribution of the B-scores for any fixed value of A; but 
the distribution of the A-scores will rarely be approximately normal—and indeed calls for no 
explicit specification—in the infinite succession of similar experiments among which our actual 
sample constitutes a single act of repetition. Throughout what follows, as in later sections of 
this chapter, we therefore assume consistently that the relevant framework of repetition is a fixed- 
A set which may have any distribution involving at least 3 different A-score values. ‘This assump- 
tion is unnecessary only when kpa = 0 = Fpa in the universe asa whole. In effect, we then sample 
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within the stratum of the universe of 17.01. On this understanding, it will be convenient to 
write 


3 2 
C= da so that C= a : : , (vi) 


p.Va.s (> pr. 43)? 


Whence from (1) 


1=C E i=c A 
>: pC; =0 and : == pa O 3 ; (vii) 
a el ae > pA 
i=1 
We may now rewrite (v) as 
(Rice ee E A.C. Mi o o 


Our next step is sufficiently important to justify a digression. Suppose that the player’s 
score (x) at a single trial is the sum of some fixed multiple (C,) of each score (x;) recorded by 
one of a set of p lottery wheels, as in the Orthogonal Lottery Model of Chapter 12, i.e. 


A Ch ik O 


If the variance of the player’s distribution is V, and that of the score distributions of the 
ith wheel V,, | 


Va = CVa + CoV, + CoVa... + CV) 
If the score distributions of the wheels are identical, we may write 0% = V, = V, . . . = V „and 
m=p 
Vom a(t ES... +0%= A 
m=1 


Furthermore, we may suppose that only c of the p values of C,, are different, and there are p; 
identical values of C;, so that 


Instead of recording as his score the weighted sum of single trials of the p wheels, we may 
vary the rule of the game so that the player records as his unit score the mean of spinning p; 
times each wheel to which we assign the particular weight C,;. The variance of the mean score 
distribution for each such wheel is then (0%, — p;) and 


wc o” i=C 
Ot . 
ES ER = Pi è a m- kos oa o de è e . . (1x) 
i=1 Pi i=1 


Now the hypothesis we are exploring as stated in 17.00 is that the errors associated with any 
value of the A-score are independent of one another and of the value of the A-score itself, the 
mean error-score (M, . ies) whose expected value is zero is therefore independent of the A-scores, 
which occur with the same frequency from sample to sample within the fixed-A set. Within 
the fixed-A set the variance (0% ,) of the distribution (kpa. ecs — kpa), or Of kya. ¿s Since change 
of origin does not affect the variance, is thus an expression of the same form as (ix) in which 
(op — p,) takes the place of the true variance of the mean error (0? p,), i.e. 


1=c i=c 


Ds gm e Ps.C, a > Er 


t=1 1=1 
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Whence from (vii) above : 
2 Ce a, 
== == Oa Eg ais ere . . e . e (x) 


Pi pas Sh A 
i=1 


If we now make the assumption that the error distribution is normal, whence that of the 
mean of a p;-fold sample is a normal variate, the principle of the fixed-A set implies that 
(Roa. es — Roa) as defined by (viii) is a sum of independent normal variates and is itself therefore 
a normal variate with zero mean, since kpa is the mean value of kyq.¢3. Thus we may define 
a normal square standard score (Chi-Square variate of 1 d.f.) by the ratio 


(Roa. | E Roa)” $ (Roe. es kaa) P . Va s f ; i (xi) 


pA 2 
O; z8 0 


The error variance (o?) is, of course, the expected mean square deviation of the B-score 
from its true mean value defined by the relation M,.a = koaXa + My. All our sample tells 
us is the deviation of the B-score from the hypothetical regression score 


Xr.a Z ee ae s > My. cs 


To get an unbiased estimate of o2, we therefore examine the implications of the tautology 
of (xxi) in 11.04. In the notation of 17.01, this is 


Ext. ahs. a — Xp. sn” ae Vo. =>: ps Vo Qe (1 baits ERR EA e (xii) 


Within the fixed-A set, V,., is a constant. Thus the expected value of the mean square deviation 
of the B-score from the sample regression line within the fixed-A set is 


EV. cs) e: s+ EdBoa.0s)”- 
By (xxix) of 17.01, this is l 
SO A 
Now by definition, 
E.(Rva.es — Rra)? = oks = Ekra. es)” — Few 
eR a) ee a Vg 2 


Whence (xiii) becomes 


Also from (x) this is 


— 1l ; — 2 ; 
2 o — ae ERA, (xiv) 
P P P 
We may thus define a statistic whose expected value is o; by the relations 
fo a E Ey. aliia — %r. 203)? and E)E MS i (xv) 
Alternatively, we may write 
ee 
$ = (1 — Tap.co)P - Vo. os (xvi) 


e p-—2 
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Our unbiased estimate of the variance of the distribution of the regression coefficient is then 


o _ (1 ~ Tar .cx) Vo.08 i 
[>= (> — PE (xvii) 


If the number (p) of paired sample scores is large enough, we have now the materials to 
supply an answer to two questions: (a) what are the implications of the hypothesis that there 
is zero linear regression ; (b) what we may rightly say if samples from different universes are 
consistent with the assumption that there is the same increment of the B-score per unit increment 
of the A-score in each. For a reason we shall discuss later more fully, (a) does not mean the 
same thing as saying that there is no law connecting the two sets of scores. Here it suffices 
to remark that the assumption of linear regression is explicit or implicit in (a) and (b), and 
our justification for this presupposes the existence of a test for linearity. 

From (x1), (x11) and (xvi) we may define the ratio of a square normal score with zero mean 
to the unbiased estimate of its variance by 


— (2 — 2) (Boa. os = haa) Va. » 
(1 TA Faa o 08 


We may regard this as an approximate c-ratio if the sample is large. To make use of it, we 
need to know the value of k,,. Ordinarily we do not know this; but zero covariance implies 
that ra = 0 = Roa. Iso, (xvii) reduces to 


aa (p 25 Daka. - (p 0 2 E 
3 E (1 == ro - CS 2 ( pas REA l i => 
Thus the normal integral gives the approximate probability that the expression on the right 


of (xviii) will be equal to or greater than its sample value on the assumption that kpa = 0 = ray; 
and this is therefore a significance test alike for zero covariance or zero value of kya. 


Fe 


(xviii) 


What is of more interest is the possibility of testing the significance of a difference between two 
regression coefficients when covariance is not zero. We shall exhibit Fisher’s test later ; but 
we may here pause to interpret (xviii) as a basis for an approximate c-test in the same 
framework of assumptions when the size of each sample is large. If we have two large 
independent samples of paired scores, one p-fold and one g-fold, we may use the normal 
integral to test the assumption that the difference between the two sample values (kya. > 
and Roa. a) Of Roa, cs is Statistically trivial. The null hypothesis is then that the common 
universe value of the regression coefficient is k,,, so that the difference is 


Roa sp Roa Pa ci de ae (Roa A ES Roa) iss (Roa a Roa) : E a (xx) 


This being the difference between two independent normal variates is itself a normal variate 
with sampling variance which we may write without ambiguity as 


2 SA 2 
Ok pg — [k.p T Ck .q' 


With analogous conventions for sample parameters, our estimate of 0%, ,y will therefore be 


| — y — ; £ 
$ ay ( Peal aa (1 E Fi, q : : (xxi) 
($ — 2V4.p ed LA 
Our approximate critical ratio for the difference test will therefore be 
d 
a A me 


Sk. pq 
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QUADRATIC REGRESSION IN A BIVARIATE UNIVERSE 


(Sampling with replacement) 


ls 
i) UL. 


Fic 122. 


The approximate test defined by (xix) involves the ratio of two complementary fractions 
of the variance of the B-score distribution, viz. 7?, . V, and (1 — rz)V 5. Significance tests pre- 
scribed by Fisher’s school involve the derivation of certain relations relevant to other partitions 
of the variance of the B-score distribution. We have already examined one such in 16.08, where 
we exhibited a significance test for the correlation ratio (7j,). If the result of the latter test can 
show that the correlation ratio does not differ significantly from zero, there is, of course, no 
need to ask whether the product-moment index significantly exceeds zero. We have seen 
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(footnote to 11.04) that 2, must be greater than 72, or equal to it, being equal if regression is 
linear, including the trivial case 72, = 0. If the true value of 73, is greater than that of 7;,, there 
must be a non-linear relation between the two sets of scores. Fig. 122 shows that a high value 
of ni, is indeed consistent with zero covariance when the B-score means are a quadratic function 
of the A-scores. 

To say that 72, significantly exceeds zero and that r?, does not, thus points to the existence 
of a non-linear dependence of the B-score on the A-score. This gives us the clue to a way of 
testing whether regression is in fact linear. We require to know the distribution of a sample 


statistic which measures the =xcess of ni, over 72. To this end we shall first recall (Table 2) 
TABLE 2 


Tautologies of Regression 


Source Parameter 
1. Sampling Error MV». acs) = (1 — ta. cs) Vo. es 
1 i=C j=D; 
Di a Mo ccs) == > > (ls. — Mis)" | 
Pcl j=1 | 
2. Non-linear VOM» ces) — Bias o Va. = Geaces eee | 
Regression visor 
1 
EMM) . acs — “r. acs) = = > PAM, . ios — Yr.. 508)” | 
FS | 
3. Linear Rines Vacs = Vague | 
Regression ize | 
TIR E PARA — Mo. 0)’ Epia > DÁ + A a | 
P =1 | 
4, Influence of V(M, > pS = "ba ecs’ V, . CS 
A on B 1 i=c 
E Mo. as — Mo. cs): == = DA(My . ics — Mb. cs)?” | 
aa | 
5. Error a +. EA a V, . cs 
i i=c¢ j= Pi 
EJE, A te .a Xe. ree a e D ` > (b; K hy ace M, as == Roa ae 
i=1. j= 
6. Total of 1, 2 Mies 
and 3, of 1 and 4 isp De 
and of 3 and 5 1 i 
EEn.. e — Mro) RR > pj (b; ¿ — M, Fg 
¿=1 =1 | 
{ 


as in the accompanying table certain necessary relations between B-scores, hypothetical linear 
regression scores and B-score means already exhibited as tautologies of the grid in 11.04. The 
column headed source in Table 2 anticipates the statistical interpretation we shall later impose 
on each parameter defined alternatively in the notation of this section and in that of 17.01. 

To make explicit what information about the universe each sample parameter of Table 1 
supplies in terms of the source of variation, we must explore what is its expected value. The 
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hypothesis stated in 17.00 is that each observed B-score has one component, the true value, which 
is a linear function of the A-score, and an independent (error) component (e;, ;) with zero mean. 
This is on all fours with the consequential relation between the scores of the player and umpire 
in the model set-up of 12.01, where we postulate a relation of the type xy =k. Xu + Xa. o. 
We may translate our hypothesis into the language of 13.04, if we conceive the true value as a 
regression-factor in the 2 column grid of paired scores, 1.e. ¡ 
PF. =Rku a FC amd bh =F +. - i . (xxiii) 

The first 6 items of the second column score-grid for the 16 pair set exhibited in Table 1 then 
take the form 

ay be e. 

ay E ee oe T 

Az 1.2 = Fot 1.32 

Az ba 2 = Fy + 62.2 

az bs. 2 = Fo + 6.2 

Az Dr... = Fa + 1.2 
The mean error is zero for any value of the A-score in the complete sample distribution of the 


fixed-A set. Fora fixed value of the A-score within the sample and within the sample distribution 
of the fixed-A set, we may therefore write 


Mois = E E Mo A MEM is E i . (xxiv) 
If the mean stratum value of F, is M;.,, we may therefore write : 
Moac Moat mna 200. Mi >= M 
Moa. ics — Mo. es = (Fi — Mi.) + (Mo. ics —Me. cs) and My. is — Me.s =(Fi— M,.,) (xxv) 
In the last expressions, 
F; — My. = (Roa a; + C) — (Roa - Ma.s + C) = koala; — Ma. s) = kva - As. s- 


Thus (xxiv) expresses the relations defined by (xii) in 17.01 in terms of the factor concept of 
13.04. In accordance with our treatment of the Model 1 balance sheet of variance, we shall 
therefore write the variance of the A-B factor distributions : 


Ta ee A ‘ š ó ‘ . (xxvi) 
The sample value of the B-score variance for a fixed A-score depends only on the error 
component, since the regression-factor is- then fixed, i.e. Vo. ics = Ve.icss If there are p; 


B-scores associated with one and the same value (a;) of the A-score, we may write the expected 
value of V,.,;, in the notation of 13.04 as 


e 
EV o. ao) =" ad 


In the same notation the expected value of the mean variance of the B-score distribution is 


E, . M(V, á eed ee E, e EVV, : ña ert E, . EV. ens 


O = Ele > 


z 
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In this expression 


¿— 1 i=c¢ i ¿—1 Lose . a 
BP )- Fe? J=5 > a- 9-855 


pi An pe J) Pm 
EME R E 
And we may define one statistic which is an unbiased estimate of error variance by 
y Pes ES e 2 andi EA e ote) 


This explains the denomination of source against (1) in Table 2. We shall need (3) and (4) to 
define (2) and can define (4) in terms of 1 and 5 since 


V(M, ; me = V, e a M(V, : pr > 
In this expression our Model 1 (fixed-A set) assumption signifies that 
Vo. cs = o, a 
pei 
p 
— 1 p-—e c— i1 
“Es Mp ae oo o — =a + o, 
(My. acs) 7 > > 
Thus we may define a statistic whose expected value will be the error variance, 2f (and only 2f ) 
the row factor 15 zero : 
2 
g Poo ie and Bist) =0 + 


2 
<i c— 1 c—1 


2 
Oo, 


eve EV». èl = Go + 


| ee 


O, 


. (xxx) 


This interprets 4in Table 2, since the expected values of s2 and s? will be identical only if o¢ = 0, 
ie. the A-score has no influence on the B-score. To derive (3) we use the relation employed 


in deriving (xv), viz. : 
E (Risa .cs) = Os + kia 


If regression is linear, we may substitute in accordance with (xxvi) 
o 
2 AE A A 5 
Edi e Vea) SS p + O; a Li e V.. es) . . e (xxxi) 


On the assumption that regression is linear, we may thus define a statistic with the same essential 
property as sm by the relation 


Z= ua: P- Vp. and Eds?) = of + po, a . (xxxii) 
Hence from (xxx) and (xxxii), and hence subject to the assumption that regression is linear 
c— 2 $ 
E a ae Vy Te V, E == p 0, s : . (xxxiii) 


Thus we may also define a statistic whose expected value is the error variance by the relations 


O r E eae 
g =z Cba.co — Toa.cs) P - Yo es aoe cit: and Ess) oe g“ . a (xxxiv) 


We may tabulate these results as in Table 3, which includes the statistic defined by (xxx) above 
and makes explicit the essential assumption in the derivation of the foregoing. 
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17.04 SIGNIFICANCE OF ESTIMATED REGRESSION PARAMETERS 


Table 3 of 17.03 exhibits various sample statistics, one (s2) of which is necessarily an unbiased 
estimate of residual variance (07). The expected value of others will necessarily exceed the 
latter unless one or other of certain conditions specified in the column at the extreme right holds 
good. The expected value of the ratio of any one of them to sz will then exceed the expected 
value of the ratio of consistent estimates of variance. We have explored the distribution of 
such a ratio on the assumption that : (a) the score distribution in the parent universe is normal ; 
(b) the two variances are statistically independent. Before we can employ the sample statistics 
of Table 3 as a basis for testing whether one or other prescribed condition does in fact hold good, 
it will therefore be necessary to examine which we can pair off as independent statistics. 

The student who recalls the test for the significance of a correlation ratio in 16.08 will indeed 
recognise in Table 3 of 17.03 two independent estimates whose consistency is a criterion of the 
dependence of the B-score on the A-score, viz. those here denoted s} and sz. We may denote 
their ratio as 

2 
F a Nba. cs Per 
= 1 4°00 = 


If (and only if) 0? = of, as when o; = 0, this is the ratio of 2 independent Chi-Square variates 
of (c — 1) and (p — c) degrees of freedom respectively, being therefore a Type VI variate. 
Otherwise, the expected value of 7}, ,, will be greater than that of (1 — na ,.). We must 
therefore regard an uncommonly high value of Fmu as: (i) a rarity if we are content to accept 
the null hypothesis that there is no causal nexus involved in the pair score distribution ; (11) 
alternatively, as ground for dismissing the validity of the null hypothesis. 

Scrutiny of Table 3 invites examination of the properties of two other ratios, one as a 
criterion of zero covariance (kpa = 0), the other of linear regression, viz. : 


Oe) eee _ Chae =e (p 4) : 
ES A. lO 


In connexion with any use we may subsequently make of Fmu above and of either ratio in 
(xxiv), one property of the denominator calls for comment. If we have only a single B-score 
for each different value of the A-score, the sample variance of each column in the frequency 
grid is zero and M(V,.ac;) = 0. Such a unique relation between A-score and B-score values 
also signifies p = c, so that (p — c) = 0. Thus the statistic sí in Table 3 is indeterminate, being 
the ratio of two zeros. A desideratum of the three significance tests based on sé and discussed 
below is therefore that there are several non-identical B-scores for at least one value of the 
A-score. 

To get into focus the problems we are now ready to tackle we may with profit recall the 
assumptions of the test based on Fmu above as stated in 16.08. There the null hypothesis was 
that each set of B-scores corresponding to a particular A-score is a sample from the same universe 
as the set of B-scores associated with any other value of the A-score. If we look on each column 
of the score grid of 16.08 as a sample from a sub-universe of B-scores, our postulate is therefore 
that each such sub-universe is identical with any other. Formally this means that F; = 0 in 
our equation of B-score build-up, so that b;.; = e;.; + C. The constant C which is common 
to all B-scores regardless of the meaning attached to z then merely signifies that the B-score 
mean, unlike that of the e-score component, is not (necessarily) zero. If our null hypothesis 
includes the assumption that regression is linear but excludes the assumption that the B-scores 
and A-scores are independent, our model universe is no longer Bernoullian (i.e. homogeneous). 
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Each column of the grid of 16.08 is then a unique sub-universe. The entire universe of B-scores 
is then a stratified universe. 

To define the commonly prescribed F-test for linearity, and a so-called exact test of 
significance based on (xix) of 17.03 we have to be clear about three things : 


(a) any such test assumes a normal distribution of the error component of the B-scores 
and is more or less exact only in so far as this postulate is more or less correct ; 

(6) whereas the universe of e-scores is homogeneous by hypothesis, sampling in the universe 
of B-scores is stratified unless the B-scores and A-scores are independent, in which 


case of = 07, and the issue of linearity does not arise ; 


(c) a Chi-Square variate is the sum of square standard scores of unit variance and as such 
involves the true and unknown value of the appropriate universe parameter which will 
disappear in the variance ratio only if it appears in both the numerator and denominator 
of the latter. 


We cannot legitimately employ the customary formula (o2, = o? — n) for the variance of 
the mean unless we are sampling in a homogeneous universe, in this case the universe of e-scores ; 
and the denominator (1 — yj, ,,) of Feu in (i) is, for reasons we shall state more explicitly 
at a later stage, a parameter of the e-score distribution alone. Consequently, (c) implies that 
any sample parameters of an F-ratio we invoke within the framework of the assumption that 
B-scores and A-scores are not necessarily independent must be expressible in terms of the e-score 
distribution alone. 

Our next task must therefore be to define in what circumstances it is possible to express the 
sample statistics (7, — Yig..s) in the numerator of F,, as defined by (i) above and s? as defined 
by (xvi) of 17.03 in terms which involve only error components and constants. We first recall 
the tautologies (‘Table 2) : 


a = EST a. pe T E(M, .acs 7 Xr. R Ja E(M,. acs “7 M,. NE Rog. cs Xa ; p. (11) 
(1 >: a ae A Es Ev. ar. a Xp. ar ei E(x. ros M,. TY Ria cs. pe. e y (iii) 
We shall also need to make use of (vi) and (viii) 17.03, viz. : 


i= 


Bets, (kos. cs — kr) = 2 A cP EAM . (iv) 


i=1 | 
If regression is linear, Ms. as = kroa. Xa. s + Me.s Also, in any case 


Mao — My oe) == Mas ass and Mo... — My. s) Mis > 


by (vii) and (xi) of 17.01, whence we can write (ii) in the form 
Vd. ce = Ea (Mo. ose — My. as) — (Mo. os — My. a) — (Roa. cs — Era) Xa. al? 
= Es (Mo. acs — Me. cs) — (Roa. es — Roa) Xa. 3]? 
E isa t OE Vo 
— 2M... es Ed Mo. acs) 
RI do RdA) 
a Mig ea a pee ens) 
In this expression E,(M,. acs) = Me. es and E,(X,.,) =0, whence from (iv) 
a) cs EME al Me cg PV) 
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We may write this alternatively as 


i=c 
2 2 2 2 2 
acid aa Taio? ‘ Vos p3 Îi. M: ics P s Moi (Roa. ce Roa) P i Va. s 
EEE A ee E E 
o, o, o, 0% 


Since our universe of e-score components is homogeneous, we may write the variances of the 
mean e-score for a sample of p, associated with a fixed value of the A-score, and for a p-fold 
sample associated with all the 4-scores of the fixed set, respectively as 


2 2 
Co CO 

ga = and a =—. 
Pi P 

Whence the foregoing expression becomes 
2 2 i=0C 2 2 
(Noa. cs SS coe E Vo. ey > >a Mo, des DE cs (Roa. ore Roa)P . Va. 8 (vi) 
2 = TES o oly 2 y : 
0, e=1 Úm.i Om Te 


Since the true mean of the errors for a fixed value of the A-score or for any fixed-A set 
as a whole is zero, the first term on the right is the true variance of the e-score means for a fixed 
A-score being therefore a Chi- Square variate of c degrees of freedom. The second is a Chi- 
Square variate of 1 d.f. and the third is, being as already shown a square standard normal score, 
also a Chi-Square variate of 1 d.f. If we can make the appropriate orthogonal transformation to 
express the second and third terms of (vi) as Chi-Square variates of 1 d.f. included in the first, 
the expression on the left is therefore a Chi-Square variate of (c — 2) d.f. What follows is an 
outline of the proof. We first put, as in 16-02 and 16:04: 


i=c 2 i= i=C 2 

Pi MG iu MÁ ics oe 
5 tia Y e ë o Gil 
¿=1 o; = i=1 m.i 


es Eros ‘<= V pi. Ci. Mo. sos 
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Thus u, is a linear function of the standard e-score means in accordance with (viii) and satisfies 
the orthogonal condition that each u-score is a score of unit variance since the sum of the square 
of the linear constants (p; — p)* is unity. In virtue of (vi), we may also put 
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Thus u, is a linear function of the standard e-score means and satisfies the orthogonal condition 
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The definition of u, and uz is also consistent with the essential orthogonal condition that the 
sum of the cross products of the linear coefficients vanishes, since this sum is 
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We may thus write (vi) in the form 


2 2 = = 
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Hence the statistic so defined is a Chi-Square variate of (c — 2) degrees of freedom. The 
denominator of F,„ in (i) is at once expressible in terms of the e-score distribution since 
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In the last expression we may put as in 16:04 
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We may then define the value of w; in the range ¿ = 1 to 1 = cas 
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Thus the statistic on the left is a Chi-Square variate of (p — c) degrees of freedom and we may 
write 
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In deriving this expression we have defined by (xii) the w-scores excluded in the denominator 
in the same way as the u-scores of (viii). Thus the numerator and denominator of the ratio 
are statistically independent. We have been able to express the numerator in terms of the e-score 
distribution only because we assume linearity of regression. The expectation of the numerator 
of (xiii) will exceed this value on the contrary assumption. Thus the variance ratio whose 
distribution provides a criterion of departure from linearity is 
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The ratio of the square of the deviation of the regression coefficient from its expected value 
to the unbiased estimate of the variance of its distribution we have seen to be 
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For what follows we may write this in the form 


== Wrecked Var gy 
(1 TN vocal . Vo. cs 
All the essential relations for a significance test of the departure of the regression coefficient 
from its expected value are implicit in the foregoing derivation of F,,. We first adapt (iii) 
above as follows : 
(1 ES pen se eet EEr. Asy, EE M,. “ae > (M, . E aie M,. an Sx (Roa. es AY. s 
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The first term on the right is a Chi-Square variate of p degrees of freedom, and each of the 
remaining terms is a Chi-Square variate of 1 d.f. Moreover, we have shown above that we can 
express the two latter as square w-scores included in the p-fold sum of square w-scores 
equivalent to the first term. Thus the expression on the left of the equation is a Chi-Square 
variate of (p — 2) degrees of freedom ; and the standardised sum of squares in the numerator 
of F, in (xv) is equivalent to a w-score we have eliminated from the denominator. In short, 
F, in (xv) is the ratio of a Chi-Square variate of 1 d.f. to a Chi-Square variate of (p — 2) d.f. 
defined more compactly as 
| — (P — 2) (Roa.cs — koa)? Va.s (xvi) 
E ello a 
This ratio defines the distribution of the deviation of the regression coefficient from its 
expected value whether the latter does or does not numerically exceed zero. If kpa = 0 it is 
identical with F,e in (i); but we cannot otherwise express the distribution of the product 
moment index as a Type VI variate. In fact, of course, an exact test for zero covariance is 
redundant, since it suffices | 
(a) to test first whether »;, significantly exceeds zero by recourse to the ratio denoted by 
Finy above, as in 16.08 ; 
(b) to test subsequently whether regression is linear by recourse to F.u in (xiv). 
The reader will note that the square root of F', as defined above is a t-variate of (p — 2) degrees 
of freedom, since the numerator has only 1 d.f. Thus we may use the t-table. Similarly, F,, 
in (1) is a ¢-variate ; and we may test for zero covariance, the appropriate t-ratio being 
pe ee 
P= — a > ; E š ; (xvii) 
The derivation of a so-called exact test corresponding to (xxii) of 17.03 introduces no new 
issue of principle. If we have two independent samples of p, and p, paired scores respectively, 
we shall have two estimates of residual variance (07), viz. : 
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Hence we may test whether the residual variation is the same in both samples by the variance 
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If this ratio does not exceed its expected value unduly, i.e. if it satisfies what criterion of signi- 
ficance we agree to adopt, we may proceed as in 13.05 and 16.07 basing our estimate of the 
residual variance on the mean. Thus we may write 
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Accordingly we may define a statistic by the relations 
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The variance (c7.,) of the distribution of the difference (kay. — Rav.2) is the sum of the 
variances of the distributions of k,,., and Ra». a, 1.€. 
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Hence we may take as our unbiased estimate of ož., 
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We thus obtain a t-ratio of (pı + pa — 4) degrees of freedom : 


t, = (kav -1 — Ras 2) : ; : ; (xx) 
Sk. c 

We have still to dispose of an issue mentioned in the opening paragraph of this section, vrz. 
what is the probability that a particular observation will exceed its estimated value given by 
the regression line ? We can set approximate confidence limits to the regression score, if we 
assume that the distribution of the e-scores is normal. The deviation of an observed value of 
the B-score from the regression estimate is by definition (X.a — Xr. aes) = (Xo . a — Mo.cs — 
Roa. es - Xa.s). For the same fixed value of the A-score within the fixed-A set the expected 
value of this is 


Edxo. a szi Mo. os Eis Ria. Cs * Ad e M5. as TE M,.s = Roa . a eae == 0. 
We may thus write the deviation of (x,. 4 — *». acs) from its expected value as 
E Eo Mo. z aan (M,. cs ~ M,) S (Rog. ea TR ee — Pee ry Mea ar, (Rea. os ie Pika oa: 


We have shown above that the squares of each term in this expression expressed in standard 
form are independent Chi-Square variates, if the distribution of the e-scores is normal. Con- 
sequently, we may regard it as the sum of independent components on that assumption ; and 
since the variance of the distribution of a raw-score deviation from its mean is necessarily that 
of the score itself, we may regard the variance of (X».a — *r.acs) for a fixed A-score as the 
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sum of 3 additive components. If the estimate k,,.,, (and hence the value of x,. aes and 
Mb. ¿s) is referable to p paired scores, components are 


Component Variance 
€b.a 0% 
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If we write the variance of (x,.4 — *r. aes) as 02, ,, we thus have 
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For computation it is more convenient to write this as 
1 A* 
a hor — 
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Since this expression involves V,.,, it presupposes the Model I approach, i.e. sampling 
with the sub-universe of the fixed-A set. If the e-score is normal, as we also assume in this 
context, the deviation (x,..,— X». acs) involves the differences of independent normal variates, 
being therefore itself a normal variate. Thus a deviation (x,..— X».aes) from its expected 
value if as great as 2c,,, will occur about 1 in 20 observations in the long run. Actually, we 
cannot assign an exact value to of, and must use our unbiased estimate s? of (xvi) in 17.03. 
For an approximate normal test (when p is large) the appropriate square c-ratio is therefore 
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By hypothesis the expected value of the numerator in the above is zero, and F,, being the ratio 
of a square standard normal score to the unbiased estimate of its variance, is a £ variate of (p — 2) 
degrees of freedom. 


17.05 THe METHOD OF Least SQUARES 


When we speak of a sample statistic such as kya. ¿, defined by (1) of 17.02 as the best estimate 
of a parameter (e.g. ky, in the notation of 17.01) of a universe, we may mean that it satisfies 
either or both of two criteria : (a) lack of bias ; (b) efficiency. An unbiased estimate is a sample 
statistic whose long run mean value, i.e. mean value for an indefinitely large number of inde- 
pendent samples, is exactly equal to the corresponding universe parameter. We have seen 
(p. 725) why kya. cs defined by (i) of 17.02 is in fact an unbiased estimate of Ria; and we shall 
now ask whether it is the most efficient one. 

We have had occasion to refer elsewhere to the concept of efficiency, but have hitherto 
formulated no general procedure for defining a sample statistic with due regard thereto. In 
this context two results established in 17.03 simplify our task. We have seen that the distribution 
of the regression coefficient is normal if : (a) regression is linear ; (b) the distribution of errors 
is normal. When the sample distribution of an estimate is normal, it is possible to define in simple 
terms a criterion of its statistical efficiency. We speak of one estimate as more efficient than 
another if we can assert with equal confidence that the true value lies within a smaller range 
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of values. When the distribution of the estimate, in this context kpa. es, is normal with variance 
oz, the corresponding standard score of unit variance is (R5a.es — Roa) + Ok; and we can 
assert with 95 per cent. confidence that kp, lies within the range Ryq.¢s +20, as indicated in 
16.05 on p. 696. To make the ES range of k,, as small as possible we therefore have to 
define ko... in such a way that o; is smaller than the variance of any siii normally 
distributed estimate of kpa- 

The principle of minimal variance last stated is another name for what has long been in use 
among physicists as a curve fitting device under a different name. The so-called method of 
least squares often invoked to introduce and to justify the use of the sample statistic defined 
by (i) as an estimate of the constant of a physical law expressed in linear form does in fact 
justify the assertion (Appendix II) that it is an unbiased one, as we have seen to be true 
(17.01) for other reasons. We shall now see that the statistic so computed has maximal 
efficiency, i.e. that Roa... so defined has minimal variance, if we assume a normal distribution 
of errors. 

Actually, we do not know the exact value of oz, but we can regard the ratio of (Riaz — Roa) 
to its unbiased estimate s? as a t-variate. Whence our problem is to o hey age that í 
is a minimum. In (xvii) of 17.03 we have exhibited sj as a linear function of s, within the fixed 
A-set. Whence it suffices to define k,,.,, so that só is a minimum. We define the term x,. acs 
in the expression for s? to be a point on a line of which the equation is 


Xr. acs = My. os + Roa. cs» Xa.e 
If we use E = E,.E£,.,, for brevity 
Bs a — Xs. ace)? = Eloy. a — Mb. ca)? — Loa. os - Elo. a — My. co) Xa.s + Fiaa-  E(Xe.5): 
Whence from (xv) in 17.03: 
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In this expression Cov (Xa. es - Xp. es) is the sample covariance of the A-scores and the B-scores. 
If we are now to define kpa. es in such a way that s¿ is a minimum, we must put 
ds? 


AR ya - ES = o, 
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In defining kya. es in such a way that it is an unbiased estimate of kpa, as shown in 17.01, 
we have thus defined it so that s2and sí is a minimum. Since we can express (kpa. es — Roa) 
in terms of sí as a f-variate, we have therefore so defined it as to make its confidence range as 
parsimonious as possible. 


ce Rs Pees 


17.06 REGRESSION IN THE DOMAIN OF CONCURRENCE 


In statistical enquiries it may happen that observational data involving two variates, e.g. family 
income and sickness rates of mothers, appear to cluster near a straight line when plotted on 
graph paper. It is then possible to assign by the foregoing procedure a straight line of best 
fit for the regression of one variate on the other, e.g. mother’s sickness rate on family income. 


r" 
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It is customary to speak of the equation definitive of the fitted line as a regression equation, 
and to regard it as a device for predicting the value of one variate when we know the other. 
Needless to say, prediction in this context means at best assigning confidence limits to a so-called 
expected value of the variate ; but the legitimacy of doing so raises issues quite outside the 
scope of considerations which justify the argument of 17.02-17.03 in the domain of physical laws. 

As we have now seen, the procedure we call the derivation of a regression line is what 
physicists call the least square method of determining the best value of a physical constant ; 
and its rationale in this context of physical laws implicitly signifies what we have elsewhere 
designated a consequential relationship. With Pearson’s collaboration, Galton, to whom the 
term regression is due, applied it to such situations as the concomitant variation of physical 
measurements of relatives, e.g. when one plots the height of one member of a twin pair against 
the height of the other; but in such situations the relationship involved is concurrent (vide 
8.01 in Vol. I), and it is by no means clear that mathematical assumptions appropriate to a 
theoretical analysis of the sampling process in the consequential domain of a physical law are as 
relevant to concurrent relationships as Pearson believed. 

A paradox which confronts the student in a different context may serve to focus attention 
on the need to scrutinise such assumptions when we transfer them to the domain of concurrence. 
When the relationship under discussion is consequential, the square of the correlation coefficient 
is a precise measure of explained variance defined by the relation of = 72, . of of (xvi) in 17.01 ; 
but this is not a rule universally applicable to situations in which linear regression arises. This 
we shall see more fully in the next chapter. Here we may dispel a difficulty which otherwise 
confronts the student, if we anticipate a conclusion established later, when we derive (p. 791) 
the correlation coefficient of two tests A and B as 7,, = a,b, in terms of their so-called com- 
munalities a? and bi. Hence for two tests with the same communality fap = a> which is the 
fraction of the variance of the test score distribution attributable to a component common to 
each set. ‘Thus we identify the explained fraction of variance with the correlation coefficient 
itself in contradistinction to its square. 

We can get some light on this seeming inconsistency, if we recall the simplest form of the 
umpire bonus model of Chapter 9 in Vol. I, the score system being 


Ngee A Ba: x 5 Xp = Xu F e 


In the consequential domain of the relation between the player’s score and that of the umpire 
we then have 
Ou 
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In the concurrent domain of the two players” scores we have 


Cu 


Var = = Yau + Tou: 


ere) 
If both players toss the same die the same number of times o, = o, so that fau = 7p, and 


i a Dag of 

To the present writer, it seems that this distinction resolves the paradox under discussion 
when we consider the way in which we derive a line of best fit by the method of the last two 
sections. In plotting the results of a physical experiment we may distinguish between two 
procedures : (a) each value of the so-called dependent variate, e.g. the stretch of a spring, plotted 
against a particular value of the other variate may truly correspond to one value of the latter, 


as when we successively measure the stretch produced by adding one and the same load to the 
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- scale pan; (b) each value of the dependent variate (e.g. blood sugar) plotted against one and 
the same value of the other variate (e.g. insulin dosage) involves an unacknowledged error of 
observation in the measurement of the latter. Either way, the customary procedure in the 
conduct of an experiment entails what we have tacitly assumed in fitting a line to our observations 
by the method of least squares, viz. that all the errors of observation arise in assigning a value 
to the so-called dependent variate. 

In terms of our model situation, we may therefore say that we treat the situation as a player- 
umpire relation whether our laboratory procedure does (b) or does not (a) involve concurrent 
liability of both variates to error. If, in reality, both variates of an experimental set-up are 
subject to error of observation, our method of plotting our observations transfers errors of one 
sort to the opposite side of the balance sheet, as if we were to assign to player B (the dependent 
variate) the score x, = X, + Xa. o + Xb.o and the score x, = Xa to player A in the umpire- 
bonus set-up. 

In theory kas is the reciprocal of kp, when there is no error variance, i.e. when fa, = 1, 
since in that event | | 

Rap A ta FE = , Roa ore ae om es : ° ‘ ; (1) 
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In laboratory practice, of course, this is not so; but the fact that application of the method of 
17.02 leads to two different lines of best fit does not constitute a dilemma. In the laboratory 
there is commonly a clear-cut operational distinction between the variate we deem to be 
dependent (e.g. volume) and the alternative one, i.e. the one which is more amenable to direct 
control (e.g. pressure). In applying the method of 17.02 to laboratory data we do not then have 
to make a choice between two ways of fitting a line. Admittedly, this is not always so. The 
laboratory worker may be free to choose one of two procedures : (a) to measure the stimulus 
requisite to produce a fixed response ; (b) to measure the response evoked by a fixed stimulus. 
In either case, however, fixing the value of the so-called independent variable may in fact be 
subject to experimental error, neglected by the way we plot our data. In terms of the allocation 
of errors to one or other side of the balance sheet, the two procedures are not identical. 

In laboratory enquiry, the very fact that one variable is under the control of the investigator 
signifies that the relation sought is consequential. On the other hand, statistical enquiries in 
the domain of sociology, psychology and biology commonly confront us with concurrent relation- 
ships of which the common element is not under control. The end in view may decide the 
proper choice of one or other variate as dependent, i.e. the variate x, when we speak of the 
regression of x, on x,; but what legitimate aims we may indeed pursue raises issues foreign 
to the considerations which commend the methods of 17.03-17.04 in the domain of experiment. 
If we plot weights of schoolboys against age (or vice versa), we may adopt one of two procedures. 
In these days of computing machines, it is common practice to determine a value of Cov (Xa, Xp) 
based on the cross-products of all the scores, and it is no longer clear that we have to conceive 
the sampling process as restricted to the sub-universe of the fixed-A set or of the fixed-B set. 
Alternatively, we may group all children of over 8 years and no more than 83, labelling the age 
of the group as 81 years. Our calculated 7,, will then be based on cross-products of the B-scores 
(weights) and the corresponding fixed age group medians. This procedure is superficially more 
like laboratory procedure than is the alternative; but the likeness holds good only in the 
domain of arithmetic. | 

The implications of the use of curve-fitting by least squares do not admit of any formidable 
ambiguities in laboratory practice ; and if we fully understand the implicit, as well as the explicit, 
assumptions we make when we use the methods of 17.03-17.04 for the analysis of experimental 
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data, we shall avoid the pitfalls which beset us when we use the technique of regression in 
statistical enquiries outside the laboratory. Contrariwise, a too facile view of the similarities 
between the two situations will assuredly lead us astray. At the start, we should be clear about 
the implications of the fact that the method of least squares transplanted into the field of biology 
and sociology by Pearson was originally a theory of error, a basic assumption being that the 
physicist can control every relevant variable in an experimental set-up other than variation 
arising from unreliability of his recording apparatus, such variation being free from systematic 
bias. In biological experiment, one has commonly to take stock of individual variation, e.g. 
with respect to genetic constitution ; but the investigator, with a justifiable intention of pro- 
pounding a law, implicitly assumes the possibility of repeating observations based on different 
individuals without introducing a systematic source of variability. 

In fact, we assume more than this when we invoke statistical tests dealt with in this context. 
Our postulate is that the source of residual variation is the same for all samples ; and this makes 
the Principle of the Fixed-A set the king-pin of our theoretical edifice. The postulate itself is 
admissible in comparison of physical experiments in which investigators of equal competence 
employ the same instruments or instruments of equal precision ; but we shall shun the tempta- 
tion to regard statistics as an efficacious remedy for shoddy experiments in the biological domain, 
if we are alert to the need for factual support to sustain the proposition that the non-systematic 
components of variation in different samples of living creatures are necessarily equivalent. Only 
the strictest attention to selection of stocks standardised with respect both to nature and to nurture, 
age and season, can confer plausibility of any such assumption implicit in what Churchill 
Eisenhart calls the Model I approach. ‘The admissible postulates of physical experiment, and 
those the biologist may be able to adopt with justifiable confidence on that understanding, are 
at least open to grave doubt in many situations which prompt sociologists and psychologists to 
employ regression equations. In such enquiries, what is usually a more important source of 
variation is a complex of external agencies we have no power to control. Were it otherwise, our 
residual variance would be simply a measure of the failure of our powers of observation to detect 
a law of nature. As it is, our residual variance is to no small extent a record of the inadequacy 
of any simple law as a valid description of our observations, and an admission of our power- 
lessness to recreate a unique historic event. 

To make the last assertion more tangible let us recall the law of the stretched spring. When 
we state such a law, the end in view is to tell us by how much we can extend a spring, if we 
measure the extension with sufficient accuracy under specified loads. A latent assumption is 
that our laboratory is static. The results would indeed be different if we made our observations 
in an aeroplane at different (and unknown) heights above sea level in virtue of variations w.r.t. 
the gravitational constant g. ‘The best we could then hope for is that we could distribute our 
observations on the stretch with respect to a specified tension so that differences with respect 
to elevation would be uniformly distributed. Even in the absence of error inherent in the tech- 
nique of observation as such, our line of best fit could then tally with the one definitive of the 
physical law of the static laboratory only in so far as it described the trend of averages. Figura- 
tively speaking, the laboratory of the social scientist and of the vital statistician is always an 
aeroplane of unknown and changing height above sea level. Errors of observation in the 
Gaussian sense may be, and indeed commonly are, trivial components of the residual variation 
undetermined by the course of the regression line. 

If it is important on this account to recognise that we cannot rightly equate the residual 
variation of the sociologist or of the vital statistician to instrumental or personal errors of observation 
as in experimental science, it is no less important to recognise that any statement of a scientific 
law is complete only in so far as it implies a specification of its own limitations. The laboratory 
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worker familiar with such limitations can commonly shirk the obligation to make them explicit 
without compromising the usefulness of conclusions drawn from the law itself. ‘Thus we can 
safely use an equation prescribing how the density of water varies in relation to temperature 
at 760 mm. atmospheric pressure without incurring the temptation to invoke its aid to prescribe 
the density of steam at 120°C. and sea-level pressure. We learn at school that Hooke’s law 
breaks down if the extension approaches breaking point, and that Van der Waals’ equation has 
to replace Boyle’s simpler and for most purposes good enough rule in the neighbourhood of 
absolute zero or of the critical pressure. The explicit algebraic formulation of a physical law is 
always incomplete from this viewpoint, but the experimentalist translates it in action with the 
reservation that the correct interpretation carries with it a supplementary specification of the 
boundary conditions of its validity. To say this is to say that the legitimate use of an equation 
definitive of a structural law in physics lies within the domain of interpolation ; and the teaching 
of elementary physics familiarises us with the absurdities which arise when we use it for 
extrapolation beyond the boundaries of its applicability. 

This is indeed precisely comparable to what we do, if we succumb to the temptation of 
using a regression equation as a basis for predicting how a wage increase will affect fertility 
or infantile mortality. What is a sufficiently well recognised truism in experimental science 
is a caveat we too easily ignore in sociology and vital statistics. For instance, we cannot legi- 
timately infer from the regression of completed family size on family income what the completed 
family size would be, if we stabilised all incomes at a fixed level, thereby changing the framework 
of conditions in which the regression relation is valid. The statistical literature of the last fifty 
years abounds with conclusions of this type, though it is easy to detect the fallacy, if we take 
stock of a fundamental difference between experimental investigation and statistical description. 

We have already had occasion to recognise that there is a clear-cut distinction in experimental 
science between what we commonly call the dependent and independent, or as we might more 
informatively say consequent and antecedent variates. The antecedent (so-called independent) 
is the one which the investigator has under his direct and deliberate control; and commonly, 
though not always, it is the only one within his power to control with ease. For instance, we 
cannot fill a hypodermic syringe with adrenalin by raising the blood pressure of the patient ; 
but one can raise the blood pressure of the patient by injection of the contents of a syringe 
containing adrenalin. | 

Now we recognise a relationship as consequential because, and only because, we are able to 
interfere actively with the course of events ; but we are not recording the result of any such active 
interference when we plot a regression graph of completed family size or maternal morbidity 
on family income. At least as likely as not, the relationship involved is concurrent ; and our 
plotted data cannot give us any assurance to the contrary. The algebraic treatment of correlation 
in Chapter 12, in contradistinction to the more customary geometrical approach, can indeed 
make this logical distinction explicit. We can influence the score of player A, if we record wrongly 
the result of the umpire’s score ; but we cannot do so by recording the score of player B wrongly. 

In this context, however, a factual as opposed to a schematic illustration may prove more 
helpful. We may imagine a situation not uncommon in Asia or Africa, vzz. a population subject 
to malaria spread over a dry hillside and swampy lowlands around it, the more prosperous 
Herrenvolk householders having settled on the heights. In the nature of the case, we should 
then expect to find a correlation between mean income and malaria incidence in the various 
precincts, and it might well happen that we could plot our statistics as a linear regression graph. 
In this set-up raising the income of the less prosperous sections of the community might permit 
more migration from the swampy lowlands and hence less risk of malaria, but only if there 
were still land available for building on the uplands and only if there were no commensurate 
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increase in the value of house property. In the absence of any information about the availability 
of alternative accommodation and the prospects of the building market we therefore lack sufficient 
reason for inferring what effect an all-round increase of income would have. Since our regression 
equation contains no information of this sort, it cannot legitimately lead us to forecast the effects 
of income change. 

We may now sum up as follows what we can legitimately mean by prediction in descriptive 
statistical enquiries : 


(a) a regression graph specifies a sub-sample representative value which we can circum- 
scribe by confidence limits by the methods of 17.03-17.04 on the assumption of a normal 
error distribution ; 

(b) of itself, the regression equation implies no information concerning the causal relation 
between the variates, and does not entitle us to make assertions concerning the results 
of human interference ; 


(c) even if we have additional sources of information to identify the relationship of the 
variates as one of antecedent to consequent, it is still necessary to remember that : 


(i) a regression equation describes occurrences in a specified framework of repetition ; 


(ii) assertions concerning the effects of human interference will not necessarily be true 
if the latter prescribes a different framework. 


In what we have discussed so far, all the emphasis has been on the distinction between. 
the consequential domain of experiment and a concurrent domain which is amenable only 
to passive observation. The distinction has an implication which is worthy of more explicit 
comment from a viewpoint adumbrated in a passing remark to the effect that every sociological 
situation is a unique historical event. In the derivation of the significance tests of this chapter, 
we have assumed what we here call the principle of the fixed-A set. In other words, we view 
the situation from the viewpoint of what Churchill Eisenhart calls Model I, 1.e. as one we 
can repeat at will in the same way. In a well-controlled laboratory set-up this is a meaningful 
assumption. It is at least permissible to doubt whether it has any meaning whatsoever in the 
domain of sociology. Admittedly, it will have one for those who can stomach Plato’s conception 
that the shadow world of human experience is but a sample from the infinite and eternally 
repetitious universe of universals. To others, its semantic credentials will be less patent. 


17.07 PARTIAL REGRESSION AND MULTIPLE CORRELATION 


Hitherto we have confined our attention to regression as a linear relation between two variates. 
Perfect linear regression of the B-score (x,) on the A-score in a bivariate universe signifies that 
the mean B-score (M,.,) associated with a particular A-score (x,) is directly proportional to 
the latter, 1.e. 

My im Rg tC or Mi. =e 
If this is so, certain identities follow as tautologies of the grid which summarises the structure 
of the universe, in particular 


Cow (xa, #1) = Roa Va and Viale gee. 


In random sampling from the bivariate universe of the consequential domain, the unbiased and 
most efficient estimate (Rya. es) Of Roa has the same relation to the sample covariance and sample 
A-score variance, viz. : 
Cov (Xo . css Xb. oa) 

ia 


Roa: oo 
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These expressions constitute a particular case of a linear relation involving several variables. 
For our purpose it will suffice to illustrate the pattern by consideration of the case which arises 
when we prescribe the mean value (M,. a») of one score («,) in terms of particular values of two 
others (x, and x,) connected therewith by the linear relation 


Ma a eK . ° ° . (1) 


The mean value of M,.,, is the mean C-score value for the particular set of A-scores and 
B-scores which with them constitute the trivariate universe, whence we can eliminate the 
constant K in the usual way : 


M, = kao e M tko- Mi +K, 
`. Maca ee M, an ar ee Hees Ma) sje karaa =e M,), 
ye Mio Mo Ria a + Riba Ai . e z . . (11) 


Tautologies of Multiple Regression. Equations (i) and (ii) define the relation between the 
mean C-score and particular values of the A-scores and B-scores of a trivariate universe in which 
regression of the C-score on the other two scores is exactly linear. We can visualise such a universe 
in the idiom of Chance and Choice as the long-run result of a game of which the 3 recorded 
scores are: (i) the player C; (ii) an umpire A; (iii) an umpire B, as in the model of 12.01. 
To make the model as general as possible for our purpose we need not assume that the scores 
(x, and x,) of the umpires are independent. Thus we are free to regard them as correlated in 
virtue of the contribution of a third umpire to each of them. The rule is that the player adds 
to his individual (and hence independent) score («,) some multiple (kea. p = Ra) and (Res. a = Ro) 
of each of the umpires, so that 


Ne = Xe F Ra - Xa + hy. £o . . . . . (iit) 


In terms of deviations from the score component mean values (Me, Me, Ma and M) this is 
equivalent to 


Xe = X. + ka. Xa+ hy. Ap - i eA 
Since the individual score (x,) of player C is independent of the umpire’s contribution, 
it can take any value for a fixed value of the A-score or the B-score, so that Mica = My deme 
therefore constant and equivalent to C in (i). Thus regression of the C-score on the A-scores 
of the two umpires is linear. 
If we multiply (iv) by X, or X, and take the mean value of the product, we at once derive 
as a grid tautology 
| Cov (Xas Xe) = Cov (Xes Xa) + Ra - Va + Ry Cov (Xas Xo) ; 
Cov (x, x,) = Cov (Xe, X1) + ky. Vo + ka Cov (Xa, Xp). 
Since the player’s individual score is independent of that of either umpire, the long-run value 
of Cov (Xa, Xe) and Cov (x,, Xe) is zero and 


Cov (Xas Xe) =k,.V,+ k, Cov (x; x) - i i ; (v) 
Coulin 8.) = Ry Fr F k FOP ay dp) > : A ree | 
We can now eliminate k, or k», e.g. if we put 
Cov (Xa, k) Cov (%q, s) = Ra. Va. Cov (xq, xp) + ky Coot (£a Xp) ; 
Cool NOV = Ey a COO ey 0) Ra Fae is 
__ Cov (Xa, Xe) Cov (Xas Xo) — Va Cov (Xv, Xe) 


t. k, = Be A Oe ASS . . . (vii) 
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Similarly, 

_ Cov (x5, X) Cov (%a, x) — Vz Cov (xa, Xe) aa 
Cov? (Xa, X) — Va. Vo 


For convenience at a later stage, we may write these results in terms of sums of square 
deviations or products. If the 3-dimensional grid contains n score triplets, we may write 


Ra 


j=n p=n 
SRA 2 X? and Sy = > E O fee < x š (ix) 
p=1 


j=1 
The above expression then takes the form 


Ge Soe J Sas uE. Soo > Sas 


k, ES Dae è Sab Pa Das . Soe 


and ky = —>—— a 
Si, es ae : Sob Sab 23 Daa > Sor 

The partial regression coefficients k, and k, are expressible in terms of partial correlation 
coefficients. For brevity, we may first write 


(x) 


Opn e 
Ki AS Ra and Kas A aes k,— . . . 2 (x1) 
C 
If we substitute for Cov (Xa, X») in (vii)-(viil) fas . Ca . oy and mutatis mutandis for Cov (Xa, Xe), 
Cov (xp, Xe) we derive 
Tuc — Toe? Poo — Cet x 
Ko. SCs Kaa ee. = -(xt) 
1 — ros 1 — To, 

So far we have considered the regression of the C-score on the A-score and the B-score, 
in which case there is no ambiguity in the substitution k, = kea.» If our concern is with the 
regression of the A-score, we need to distinguish k.a ., from kac.» in the corresponding regression 
equation from which we derive the former in the same way, and write more fully 


Cu Ca 
Bo = Meee and Koa = Ree 
Ce Ce 
Whence we obtain 
2 
(Fiz A E Y av) eS 
Ka one | Se - Oo SS pa A E . . . . (xiii) 


(1 = Ta») (1 2s Tse) 
Whence from (iv) of 12.08, we get 


pA E . 
Kia MF = ha a < . e ° (xiv) 


Similarly, we may define the remaining 4 regression coefficients in terms of as.. and fre. a 
The use and build-up of what it is customary to call the multiple correlation coefficient is 
easy to understand if we visualise the 2-dimensional grid of simple linear regression as a scatter 
diagram, i.e. a cloud of points on a graph. A product-moment index approaching unity then 
signifies that the points cluster closely round the line of best fit defined algebraically by the 
regression equation. We may express this conception formally in another way, as in the 
derivation of (xxi) in 11.04. To say that such a line gives a perfect fit to the data means that all 
the points lie on it; and this signifies a one-to-one correspondence of x, to M,., for every 
value of x,. If regression of x, on x, is indeed linear, interchanging the then equally spaced 
values of M,., at the foot of each column of the score-frequency grid with the equally spaced 
values of the x, border-scores at the head of each column, is equivalent to a change of the origin 
and scale of the A-score distribution ; and we have seen (p. 353, Vol. I) that this does not affect 
the value of ra» In other words, the p—m correlation (fem) of x, with M,., is the correlation 
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(Tav) Of x, with x, when regression is truly linear, and its numerical value is a yardstick of the 
good fit. Formally, we may express the identity thus 
— EX (My. — Md) 
A aa 
VAT F Mya 
In this expression linear regression implies (xi) in 11.04, i.e. that V(M5. a) = ki Vq and 
EM A A = bog VV Up. Tet 
*. Yom = Tav 

These considerations suggest that we may profitably explore the correlation between the 
actual value of X, and (M,.,, — Mo) in (ii) above as a criterion of satisfactory fit, i.e. how 
closely particular values of X, correspond to corresponding mean values on the assumption 
that 2 other variates are relevant. Accordingly, we define a multiple correlation coefficient 
for such a set-up as 


_ Cov (Xa Mo. as) 
as UVM) 
In (xv) the value of V(M,. a») follows from (ii), since 
oo A XP + 28, . by. Xu Ko 


e. VM, « 9s) =R.Va + RE Vo + 2Ra - Ry Cov (Xa, Xp) - <- oy 


(xv) 


Similarly, 
Xe. Me. av = hal Ao < Xa) + Rol Xe» Xo) + Me. Xe, 


+. Cov (xe, Mo. av) = Ra Cov (Xa, 8.) + Ry Cov (Xoy Xe) A : . (xvi) 
Whence from (v) and (vi) 
Cov (xe, Mo. a) = Re. Va + kẹ . Vo + kak, Cov (Xa, Xp). 


Hence from (xvi) 
Cov us M, : ab) = V(M, R a) 


Whence from (xv) : 
_ Cov (Xe, Moui 


w 
c V, 

hq Cov (Xa, Xe) a Ry Cov (Xp, Xe) 

J Vo Pe 

Ea Boo = e ky. Tre 3 

Te Ce 
Pinte R, = Je ; TE oh ki : eo $ ž A A . (xviii) 
S J. 


The analogy between the multiple correlation coefficient (R.) and the product moment index Tas 
of the bivariate universe extends beyond the explanation given above. We may identify the 
true value of the C-score with the universe mean (M,. a»), in which case, (iv) takes the form 
xX, = Mia? Am 
e Vi Tiyan) + = Cree. a) Vs 
i SHO to EY 
Veh]. RY. 
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The Unbiased Estimates of the Coefficients. So far we have defined k, and k, as constants 
connecting particular values of X, and X, with the true value of (M,.,, — M,). As with simple 
linear regression, we can define unbiased estimates of k, and k, if we look at the problem 
as a case of Churchill Eisenhart’s Model I (p. 548). We conceive that we are repeating 
observations of the C-scores on exactly the same set of A-scores and B-scores in each experiment 
of which we have a sample before us. In our model set-up the player’s individual score com- 
ponent (X,) now takes the place of the error or residual source of random variation. We shall 
denote the operation of taking a single n-fold sample of score triplets by E,( . . . ) and the expected 
value of a sample parameter by E,(...). Thus 


CO. s Ye. s) as PCRs ea + ee rar ky Cov (xa. s) Xb. an 
“E Coe (ay, a £o.) = Ey EAK ADA ESV St Pee ee 


Now the expected value of E',(X,.X,) is zero in virtue of the independence of the error com- 
ponent of the C-score; and V,., is constant within the framework of the Model I set-up, 
as is also Cov (xq, x,), so that 


Ei, Coi... Bec) = Col. a a 


Similarly, | 
E, Coe (ty. uta) Mi ER Colt tea: 


By elimination in the usual way we have 


aay: Cow (Xe. ss Xa. SE, Cer try: sy Ye. Vis By eee Xe. s) 


ka CA My Vos 


E Coe ay «ay Ny. a) COOR. n Xo. 5) — Vp CO 
a Cor a A PT E 


We may thus define the statistic which is an unbiased estimate of k, within the fixed set of 
A-scores and B-scores by the relations 


E (ka ; s) — k, 
and 


k pe Cov (a, ss Xp. a . Cov (xz. ss Xe. 4) ron Vis Cov E ss Xe. J (xix) 
eG Cov? (Xa. s Ny. a) = Ves ee 

In conformity with our derivation of the most efficient estimate of kpa in samples from a 
bivariate universe, we shall suppose that the C-score is divisible into two components, one (x,) 
directly proportional to both x, and x, and the other a residual (x,) whose mean square deviation 
(V... s) from its sample mean value (M.,. ,) is to be a minimum by appropriate choice of the 
linear constants ka.s and ks. s, defining the relation of x, to the A-score and B-score. By 
definition therefore 


X= z, +6, = hy. 4. Xa + Ree te eee a : (xx) 
We can eliminate the constant K in the usual way, since 


M, = C — ka. s. Ma — ky. e- Mr. 
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Whence we can express (xx) in terms of the deviations of the score components from their 
mean values as 


Xo = Re. gs hat ko... XA +A ; (xxi) 
A AS A A — Olea. AA DR: AA + ARa. shy shady, 
Vg = Ve Re VA VA... ~ Aka. CO” (Xe 5 Xea) 
MO ag Bien). She o COU (My Arial 


., alee "== DR Va. 2.00 (La os Mo...) + 2k, . ¿ Cov CARE A 
and 
| ae 
aes VA bs Me.) Y Ra. COO LK, «59. Bb 0) 
b.s i 
The condition which makes V,. , a minimum is that 
r. Ara ni oe s 
dka. s SRi- s 
A ee) k COO (Ha ia i (xxii) 
and 
ko Fe A a) — R COV (A Ara) i (xxiii) 


Our definition of the sample parameters which define the slope of the line of best fit in the 
sense that the residual variance is minimal thus correspond to the unbiased estimate of the 
universe parameters k, and ky. | 

The student should be able to extend the foregoing derivations to regression involving 
more than 3 variables. When we have more than 2 regression coefficients to evaluate, it is 
preferable to solve the basic equations of the form exhibited in (v) and (vi) or (xxii) and (xxiii) 
by recourse to determinants. In the notation of sums of squares and products, the basic equations 
for a set-up involving four variables (regression of xz on Xa, x, and x,) take the form * 


Sea = Ras Sas + ks. Sab sk Ras ee 
Sha = ky. Sob ER e F Re. Snes 
Sea = Ro: De oe Sa F k, . Stee 


Numerical Example. The following are 3 associated variables. 


Xa Xp Xe 

5 2 21 
3 4 21 | 

2 2 15 
4 2 17 | 

3 3 20 

1 2 13 

8 4 oe 

Totals 26 19 139 

Means 3:714 2:714 19-86 

E TA ER AS. 


* For computation short-cuts see Mordecai Ezekiel: Methods of Correlation Analysis. Wiley, 1941. 
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For purposes of computation we may more conveniently work with raw-scores than with 
score deviations. We then write s; as the sum of the score x, so that M; = n . s; and Si, Sij 
as the sum of the squares of x, and the products x; . x; respectively. For an n-fold sample we 


then have 
2 


S: Seca Sa 
$ (ela 
ig == Sii PRESA and Si =3 Sij a ye el 


(xxiv) 


We shall therefore need for the case of 3 variates («,, X» x) the following column total raw-scores 
(So Sp, $5), square ditto (Sas, 395 w amd products (5, .5,;5, <6 % cme 
Our first step is to tabulate s,, sy, etc., as below : 


Xa Xp Xe x Xp x? XaXp XaXe XyXe 
5 2 21 25 4 44] 10 105 42 
3 4 21 9 16 441 12 63 84 
z 2 15 4 4 225 4 30 30 
4 2 17 16 4 289 8 68 34 
3 3 20 9 9 400 9 60 60 
1 2 13 1 4 169 2 13 26 
8 4 32 64 16 1024 32 256 128 
Total 26 19 139 128 57 2989 77 595 404 
Sa Sp Sc Saa Soo Sec Sab Sae She 


From the above we obtain by recourse to (xxiv) 


= (26.19) 45 oe 
See A Saa = 128 = ee 

> (139-36) a _ PO 
Sas m eee Sos = 37 =A ee 

a (139.19) 187. z (139)2 1602 
Sra AU e Soo = 2989 — = a 


We thus derive 
220k, + 45k, = 551, 


45k, + 38k, = 187, 


12523 
A ED 
16345 


Our regression equation for the predicted value (x,) of x, is thus 


139 26 19 


If we write as the error x,. a» of x, for a fixed value of x, and x, the equation of the C-score 
distribution in terms of the estimated value of k, and k, is 


X= hy. Xq t+ hy. Xp + Xe. ad 
Ce E AA OS A ee ee 
e Ve = ka Cov (Xq, Xe) + Ry Cov (5, x,) + Cov (Xe. ap « Xe). 
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Alternatively, we may write 
V,=R .V, +R.V,+V,+ 2 Cov (%a, 2) + 2 Cov (%a, Xe. av) + 2 Cov (x5, Xo. an): 
In any case, the fraction of variance explained by the dependence of the C-score on both 
the A-score and the B-score is 
ky Cov (Xay Xe) + Ry Cov (xy, 0) 
V, 


RV, + RV, + 2 Cov (Xas X) 
F, nd E Be ee . 


In this case 
e AN 


It is instructive to compare this result with the corresponding measure of explanation 
calculated on the assumption that the A-score distribution is the only relevant source of systematic 
variation. Our prediction equation is then 


139 » 26\ ou (ap Be) dal. 
AE (teste 7) A =>) ah 


On this assumption 


2 
Vv, 
= 4% _“* — (0-8614. 
¥ y 


C 


* + * X * 


Significance of Individual Variates. The foregoing numerical example raises the question : 
how can we decide whether it is advantageous to take stock of an additional variable ? By the 
method of 17.03 we may obtain 3 independent estimates of error variances for n triplets on the 
assumption that variation w.r.t. x, and x, does not contribute to that oE a 


E ¿(Ko . má) a a 
E (Ky, E Soe) = e, 
y ($. ox Ra . De ES k, . =) ae E 


n—3 ee 


Expected values of the statistics Ry . Sac and ky . Soo will exceed o. if there is significant regression. 
We thus derive two variance ratios as a basis for the commonly prescribed test of the significance 
of the contribution of one or other variate : . 

(n a 3)K be S be 


a 3K a 
A SR AS A ad 


1708 THe DISCRIMINANT FUNCTION 


If two classes A and B (e.g. males and females) differ w.r.t. several measurable attributes, any 
one such difference may be absolute in the sense that A’s measurement is always greater than 
that of B (or vice versa); but class differences are none the less genuine if expressible only in 
terms of averages. When this is so a single measurement has little diagnostic value. Thus 
the fact that the mean height of men is appreciably greater than that of women in the same 
community does not entitle us to assert with great confidence anything about the sex of an adult 
whose height is somewhat below the average for females. On the other hand, our assurance 
would be legitimately greater if we knew that several measurements (e.g. neck girth, hip width) 
made on the same individual lay nearer to the female than to the male population mean. 
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In combining such observations the value of adding a new one will depend partly on whether 
the sampling variance is small or great and partly on whether it is or is not highly correlated 
with another already included in the test battery. The fact that some measurements will pay 
better dividends for diagnostic purposes than others therefore raises the issue: what is the 
best way of weighting each of the observations of the test battery when we combine them in 
a single index? To get the issue into focus it will suffice to consider a test battery involving 
only 2 measurements (U and V) as in the accompanying schema : 


Individual Values Population Means 
A B Difference A B Difference 
U Xi. au Xj. bu sy = 5 au O ba Mora Mua n A 
V Xi. av Xi. bv big = Bag Xk. w Miza Mia Mo. 


If we gave each type of measurement equal weight, the mean values of our diagnostic indices 
would be 3(M,., + M,.,) and }(M,.,+ M,. +). Otherwise, we may represent them as 
L=C, Mia Elo Mia an i= C,. Maa FEO E po AN) 
For two individuals taken at random, one from each population, the corresponding sample values 
(S, and S,) and their difference (D) will then be 
Sy Cui My, an + Cote A S, = Cy. xp ge rr Gl) 
D=80 Co Lia. Ea 
The mean value of D is then Ma = C,. Ma. u + Co. Ma... If we assume an approximately 
normal distribution of individual measurements, and hence of D itself, we may prefer to define it 
in such a way as to minimise its variance and hence the limits within which our estimate of D 
will lie ata prescribed confidence level. Our problem is then to specify C,, and C, in conformity 
with this condition. We first note that the partial derivative of the square standard score, 
0 (D MF AD M) D  (D MAN 
A aha = AS Vt ge 
The expression on the left vanishes when 
2D _(D—M,) W 
Bess A E 
Thus to maximise the square standard score of the difference distribution we have to solve 
two equations : 


9D (D—M,) »V 
A 
A A Du 
A i ac 
Whence we have 
bw E va (id) 


Now the variance of the D-distribution will be the sum of the variances (V,., and V,.,) of 
the distributions of S, and S, in (ii). If V,,.. is the variance of the distribution of X;. aw etc. : 


Po ES Va.a + C ERE SE AA LUT Bt oe) 3 
Vo q ES, Ki -H aF Vis +20: Ee Cot (43:20; ¥; ne) 
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For brevity we may write 
Vu. EV no Mu; Vo. a+ Vao Moo . ° . (iv) 
Conia gas Bea) Oe n a hn) = Maia $ 0 
We then have 
Ve Vy (+h A A Mma + ROCES Mhin 
dV 


dV 
e". vas FE (Cu Muy + C, . Mv) and ee wa 2(C, «May + Cu E Muy). 


vV 


Whence by substitution in (iii) 
UA a) = dj AC, Mau F Co Mac) A i (vi) 


Since Muus, Moy and Myy are parameters of the distributions of measurements, the expressions 
in parenthesis on each side of (vi) are constants and we may write 


Ma. Es, May + as . My) e Ma. ka Muy T Ce . Mie) - . (vii) 


If we weight our diagnostic index in the usual way, (C, + Ca) = 1. The values of both 
constants are then obtainable from (vii) in terms of the population mean differences, variances 
and co-variances. Actually, it is immaterial how we fix one of them, since the multiplication of 
D by a fixed constant does not affect the ratio of D? to V. Thus we can write C, = 1 and 
solve accordingly. ° 


Numerical example. For two measurements each made on 4 males and 4 females, the 
following will serve : 


| U v 
Individual — 
A B A B 
1 Gare: 12 8 3 
2 12 14 12 4 
3 14 16 10 2 
4 15 18 10 a 
Mean 13 | 15 10 4 
] 
Difference | — 2 6 
| 
For this set-up 
Ma... = — 2; Maa 07 
Ca > 155 Coe i te) S99 
Pu 20 rig oo; 
A E | A to 
Muu = 79; Mus = 3 Myy = 30. 


By substitution in vy Moe put C= 1 
— 255 C, + 3) = 6(7:5 + 3 C,), 
51 


e 
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Thus our diagnostic index is 


31 
Š = Xu — 25" 
For the two populations its mean values are 
10(51) 
LL > 99 =—+*8, 
SS 4(51) 
i; = 42> zg ~ t80. 
* * * * 


Generalisation of the Evaluation. In circumstances which call for the use of such a diagnostic 
index (discriminant function), we shall commonly have more than two measurements to combine, 
and we may write more generally for p of them 


n=p n =p 
SES 2 Ces aS AS iit) 
dV a 
A C 
dC, Z n M kn 


Thus we have a set of equations of the form 


D — M 
Ma. =O Ca ma + Cao Ms + Cy was Cy tag) 
D—M 
May. =O (Cy «ty + Ca «Mas + Ca. tg ne Oy 49) 
etc. er. 


As before, m,,, is the sum of the two variances of the mth measurement, m,,;, being the sum 
of the two covariances of the nth and kth. We are at liberty to solve by setting C, = 1, the 
constant (D — Ma) — V being then irrelevant, and our equations thus take the form 


Mi = Maz + Ce. Mag + Es. Man A í > a) 


The foregoing treatment sets forth how we initially determine the values of the weights 
we employ to get the best diagnostic index embodying different sets of measurements. Having 
done so, we may use it as a classificatory device. Thus we determine S in (viii) for a doubtful 
specimen and assign the latter to class A ¿f the numerical value obtained lies nearer to J,, its 
A-class mean value. | 


Addendum. 'Two of my younger colleagues who kindly read through the proofs of this chapter 
have expressed the misgiving that I may have overstressed the limitations of the Gaussian Theory 
of Errors as an instrument of research in biology and the social sciences. It is refreshing to 
recall the preface and argument of a still standard exposition of so-called regression, ‘Therein 
Brunt (1917) states (The Combination of Observations) : 


The proof of the Normal Error Law has been based on Hagen’s hypotheses regarding 
errors of observation. In most of the problems of Astronomy, Geodetics, and Physics the 
errors of observation satisfy the hypotheses, and the application of least square methods is 
justified. But cases may arise in which particular care is necessary in applying these methods. 
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This is especially true of Biological problems. For organic variability is the resultant of a 
large number of contributory causes, some of which may have a definite tendency to act 
always in one direction. The effect of such a bias is to produce an unsymmetrical frequency 
distribution, and the application of ordinary least square methods is then meaningless. 
It is thus in no way justifiable to regard Least Squares as a magical instrument 
applicable to all problems. 


I recommend Brunt's book to any biologist or sociologist who entertains a reasonable 
scepticism about the credentials of a theory of curve fitting based on the pioneer work of Gauss 
and Hagen in the thirties of the last century when so recently imported into a domain entirely 
foreign to their intentions. The first few chapters are well worth reading for another reason. 
On all sides, we now hear that science has relinquished the quest for absolute truth by embracing 
the doctrine that its laws are merely statistical. This is at best a half truth, unless we exclude 
all forms of taxonomical enquiry from the title to rank as science. Even so, it is profoundly 
misleading. Statistical is an epithet with at least five different meanings in current educated 
speech. In the context of the assertion cited, it covers: (a) a calculus of aggregates (e.g. the 
kinetic theory of gases or the genetical theory of populations); (b) the Gaussian calculus of 
errors of observation ; (c) a calculus of judgments. In this chapter we have seen reason for the 
doubt Brunt expresses and the need to re-examine the assumption that (b) and (c) have anything 
in common other than the algebraic devices they invoke. In Chapter 20 we shall see how little 
agreement exists w.r.t. assumptions common to (a) and (c). 

What is equally relevant to the current claim stated above is that the Gaussian theory (vide 
Brunt, p. 34, 1. 4) presupposes the existence of a true value as a foothold for any meaningful 
definition of error as such. Within the framework of its assumptions this true value is the 
arithmetic mean of an infinite number of trials. It is important to realise that this is the only 
consideration relevant to a justifiable identification of the mean with the expected value. The 
interchangeability of the terms in current statistical writing (including this book) is misleading 
in any other context. Outside the Gaussian domain, the mean—like the variance—can claim 
no special semantic status in preference to other parameters more or less usefully invoked in 
the formulation of sampling distributions. 


CHAPTER: 18 


ELEMENTS OF ANALYSIS-OF COVARIANCE 
AND OF FACTOR ANAL Ua 


18.01 REGRESSION AS A STANDARDISING DEVICE 


In Chapter 17 we have seen that the least squares estimate of the constants of a linear law has 
a long history in the so-called exact sciences. On that account the concept of a physical law 
has cast—and still casts—a long shadow over the statistical theory of regression as applied to 
biological and sociological enquiries. From the viewpoint of the physicist, two issues are of 
paramount concern: (a) are the data of an experiment consistent with the coexistence of 
unavoidable experimental errors and of a law suggested by the data themselves or (and more 
often) by a particular hypothesis from a cognate domain of enquiry? (b) if so, what are the 
most reliable estimates of the definitive parameters, e.g. an elastic modulus or the E.M.F. of a 
standard cell ? 

The second question has in fact little meaning unless we predicate what is implicit in the 
statement of the first, i.e. that the major source of variation arises from random error of observa- 
tion, instrumental or personal. As we have seen, this is rarely, if ever, true of situations which 
arise in sociological enquiry ; and it is by no means always true in the domain of experimental 
biology. If we plot sociological and biological data in conformity with the traditional technique 
of least squares, we rarely do so to prescribe a figure comparable to a physical constant. We 
do so to decide whether some putative causal agency exerts a real influence or merely whether 
there is some causal nexus responsible for concomitant variation of different score sets. 

The student will experience little difficulty in appreciating this shift of interest, if we here 
digress to discuss a typical situation in which the biologist may invoke the technique of regression 
with more or less advantage. We shall suppose that we are investigating the response of 2 groups 
of animals on a different diet to one and the same drug. We have then to take stock of the fact 
that individuals of different size will not respond equally to the same dosage of the drug. In 
the absence of any diet effect, the administration of the same dose to each individual might 
therefore result in a group mean difference, since it would very rarely happen that the mean 
weights of the groups would be identical. The investigator can sidestep this pitfall in several 
ways : 

(a) by choosing animals of so nearly the same body weight that any such source of variation 

would be trivial ; 


(b) by pairing off individuals of nearly the same body weight in each group and by giving 
each pair the same dosage ; 


(c) by pre-adjustment of individual dosage based on previous knowledge concerning the 
relationship of dosage itself to body weight for a response of fixed magnitude ; 


(d) by using information gained in the course of the experiment to adjust the figures 
accordingly. 


The first is the ideal of the worker at home with his materials; but is sometimes im- 
practicable. Some combination of (b) and (c) is then the best course to pursue ; and (d), which 
is a pis aller in laboratory enquiry, raises issues we shall explore more fully in connection with 
the technique known as Analysis of Covariance. Essentially, the latter is a battery of significance 


alt o. n 
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tests ; but we can most easily understand the current claims for what they do test, if we consider 
the relevance of information embodied in a regression equation to (c) and to (d). 

In many situations, it will suffice to act in accordance with the assumption that response 
is directly proportional to dosage per unit body weight or per unit surface area in the range 
of size variation imposed by available stocks ; but it may be desirable to make a more precise 
adjustment or, at least, to test how much error such a procedure entails. The pilot investigation 
of the relationship will then be reducible to 2 variables, if it is practicable to produce a response 
of fixed magnitude by varying the dosage administered to one and the same individual. We 
may then plot values of the requisite dosage against body weight for individuals of different 
size. If this is not practicable, the aim of the enquiry will be to express the magnitude of the 
response (r) in terms of the joint variation of dosage (d) and body weight (w). Should this 
relation be linear, it takes the form r = kd + kaw + C. We thus have all the required informa- 
tion to prescribe the correct dosage to evoke a fixed predetermined response, if we know the 
weight of the animal and the numerical value of the 3 constants in the equation. 

The situation last mentioned involves the technique of multiple regression dealt with in 
17.07. Here our concern will first be with the simpler case involving the relation of only 2 
variables. When the relation is exactly linear, this takes the form d = kauw + C; and the 
procedure for evaluating its slope constant (kaw) is one with which we are now familiar. The 
numerical value of the constant (C) definitive of the origin in terms of the d-score and w-score 
means of our pilot enquiry then follows from definition. In our customary notation, 


Ho) Ed EC 
eve Mam Ray - M, + E 
ove E ceria Ma es Ray . My . . . . ° (i) 


When the relation is approximately linear, we may write the regression score (elsewhere denoted 
Lis 4) im: the form 


de. = kw- w FCO. a 


Example 1. Regression is exactly linear for the following fictitious set of observations involving 
weight (kilos) and dosage (milligrams) for a response of fixed magnitude : 


w d dw w? 
NI. 
9 19 94 4 | Cov (d, w) = 105 — et, 
5 14 70 25 
107 13)2 5 
8 16 128 64 Hart ae eee 
11 18 198 121 2 4 4 
Gers AAA O. ESTEROS a O. 
Totals | 26 60 420 214 Raw = ‘cae ae 
Means 13 15 105 107 g E E 


The required linear relation is therefore 


d = 3(w+16); C = 10-6. 
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Example 2. For the following figures regression is roughly linear: 


w d dw w? Observed d Calculated d 
| 
3 9 27 9 9 10 
3 10 30 9 10 10 
4 11 44 16 11 10-7 
5 13 65 25 13 11-4 
| 7 12 84 49 12 12-75 | 
Total 22 99 250 108 | 


Having conducted a pilot enquiry involving p paired observations to evaluate the constants 
in (11) above, and this will, of course, commit us to a test of linearity, we can state the approximate 
margin of error entailed in assigning the correct value to d by recourse to the formula of (xxi) 
in 17.04, viz. : 


Lae | (ii) 


1 
[rly eM 
pp Vo 
If we denote the actual dosages for a given body weight in the standard graph of our pilot experi- 
ment by d,;, we define sí for p paired values in all and c different values of w in accordance 
with (xv) of 17.03 by the equation 


DP ptes 
S= —— (d; — d,. ¿y? et) 
Poe iat 

By recourse to (iii) and (iv) we can assess the legitimate confidence with which we may 
adjust dosage to body weight in advance in accordance with (c) above. The end in view is 
characteristic of numerous situations in which it is our concern to eliminate a known source 
of variation otherwise likely to vitiate a judicious evaluation of another ; but the task does not 
always admit of disposal by recourse to results obtained once and for all by a pilot enquiry of 
the sort illustrated above. In some circumstances the existence of an additional contributory 
source of variation may not be apparent beforehand. It may then be possible to be wise after 
the event. If we do suspect the existence of a second source of variability irrelevant to our 
end in view, we may design our experiment to provide us simultaneously with sufficient informa- 
tion about its contribution to justify a confident answer to the main question. 

When our concern is indeed to evaluate 2 or more sources of variation within the frame- 
work of one and the same set of observations, it may happen that each (e.g. dose and size or 
percentage literacy and family income) is expressible quantitatively by reference to a system 
of scores, and that each is conceivably contributory to a quantifiable response (e.g. hours of sleep 
or infant mortality). If so, we may separately assess their effects by the method of multiple 
regression (vide 17.07) or, in a less satisfactory way, by recourse to partial correlation (vide 
9.04 in Vol. I). In the symbols of an earlier paragraph (p. 765) our empirical regression 
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equation connecting response with two other variates such as dosage and body weight is 
r= kid +k.w+C. Our statistical problem is whether k, and kọ each (or either) significantly 
differ from zero, i.e. whether we increase the precision of our estimate (7) by including both d 
and w in the equation. 

If one source of variation (e.g. hair colour) is conveniently expressible only in qualitative 
terms, neither of the procedures last named is applicable, and we have to employ some other 
means of summarising our data in a form exhibiting the separate effects of the relevant variables. 
It is customary to speak of such refined summarisation as standardisation ; and it may help the 
reader to appreciate the use of regression as a standardising device if we here recall a method 
of standardisation commonly used in vital statistics, especially to forestall erroneous conclusions 
suggested by comparison of crude mortality or crude morbidity statistics of communities of 
different age composition. Needless to say, the risk of death or disease differs widely at 
different ages. Consequently, mortality or morbidity rates of two different populations may 
differ considerably if one has a very high proportion of very old or of very young persons, 
and the other a very high proportion of individuals in middle life. There is a simple way of 
taking stock of this difference, if we have access to the appropriate rates separately recorded 
for different years (or other appropriate interval) of life, and if we also know what is the pro- 
portionate contribution of each such age group to each total population, We then proceed as 
follows. We first construct a population of standard age structure preferably based on an 
average figure for the test populations, e.g. by pooling the number per thousand per corre- 
sponding age group and dividing the total by the number of test populations. We then calculate 
what the mortality or morbidity rate would be if each test population had the same age structure 
as the standard one, i.e. by weighting the appropriate rate for each age group in the test 
population by the proportion of persons of the same group in the standard one, the weighted 
total then being our standard rate. 

One use of regression is essentially like this. We might rely on our previous illustration 
to exemplify it; but it will be better to consider a situation in which its use is more plausible. 
Accordingly, we shall suppose that we cannot easily control the food intake of the test animals 
(rats) of 2 groups (I and II) which respectively receive the same ration with (I) or without (11) 
addition of a small fixed quantity of an ingredient of negligible calorie value, our aim being to assess 
the effect of the latter on growth (i.e. body weight increment during a fixed period). In such 
a set-up we might have reason to suspect that the rats eat more on diet I, and our figures may 
confirm this, if we record the total (or mean daily) food consumption of each animal. Since, 
as our figures for rats on one and the same diet will show, growth depends on food intake, we 
have then to distinguish between two possibilities : (a) the only effect of diet II is to stimulate 
the appetite of the rat ; (b) diet II has also a specific effect in the absence of any increase of total 
food consumption. A specific effect in this context may signify either (or both) of two group 
differences: (i) the group means, and hence the value of the constant C' in (ii) above, are 
different ; (ii) the group regression constants are different. If the group regression constants 
are identical we shall say that regression is uniform, and we shall postulate that this is so in 
what follows next. 

Now our figures for a short fixed period may well show that regression ot growth (x,) on 
food intake (x,) is approximately linear and we shall assume that this is so. For simplicity, 
we may first assume that the linear relation is exact, and on this assumption we shall examine 
the consequences of the hypothesis that diet II has no specific effect. If this is so, we can 
therefore express the relation between growth and intake of both groups by an equation of the 
form 


hes E o PN i : : i ; (v) 
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This figure refers 
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Total 35 130 


GRAPHICAL REPRESENTATION 


OF LINEAR REGRESSION 
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Linear Regression. 


© GroupI 
C] Group I 
A Group II 


to three sets of paired scores exhibiting an exactly linear relationship. 


I 


ab 


28 
48 
100 
132 
208 
352 


868 


180 


II 
ab 


72 
168 
300 
408 
468 
532 


1948 


a? 


16 
49 
100 
144 
169 
196 


674 


III 
ab 


28 
48 
132 
208 
300 
468 


1184 


a? 


49 
64 
121 
169 
225 
324 


952 


The determination of the regression lines from the appropriate statistical parameters is as follows : 


_ 6(259) — (35)? _ 329 


6(868) — (35)(130) _ 658 


Group I. V, 36 36 + Co (a, b) = 36 36 ' A 
C=M, — Roa =P 

Group II. y PEA o ÓN =e =? 
C = M, — ky,M, = — 2 =10. 

Group III. La A =a = 
Cai. soe =m 
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We may denote the corresponding means of the two groups by M,.,, M,.. and M,.,, My.» 
By hypothesis, ky,.1 = Roa = Roa. a, 1.€. the regression coefficients are the same for each group. 
Hence we have 


Mis=R Ms=CU=Ms a En Mg ic s . . (vi) 


Within the framework of the foregoing assumptions, i.e. that regression is linear and that the 
regression coefficients are identical, the two B-score means defined by (vi) will therefore be 
equal if, and only if, Ma.ı = M,.,. Thus they will be the same if each group experiences 
the same food mean intake (M.a) as the pooled assemblage of both groups. We may therefore 
define standardised (or adjusted) means (M,,., and M,,.,) by the relations 


M op .1 = Roa - Msa + C= Moy. 
From (vi) above 

Mu. =Mo.1+ Roa Msa — Ma.) i i ~ . (vn) 
Similarly : 

M sv. = Mo. + Roall Msa — Ma. 2) ; . (vii) 


The standardised group mean growth score is therefore obtainable by adding to the crude 
group mean the product of the group regression coefficient and the difference between the food-intake 
grand mean of the pooled sample and the group food-intake mean. 

Let us now suppose that diet II does have a specific additive effect in addition to its non- 
specific action on appetite as shown by the fact that M,,.. > M,,.,. If we denote as F, the 
growth increment due to this specific food factor, our equations of score components become 


Lo. = Roa Ya. 1 FÉ and As F CEN, 
e M -2 = Roa - Ma FECHE =Ma. +F + > . ° (ix) 


If diet II has a specific effect, the standardised group growth mean will therefore be greater 
than that of group I. A numerical example will assist to clarify the foregoing arguments. 


ES * * * * * 


Numerical Example. ‘Table 1 shows 3 series ot p ( = 4) paired scores, regression being exactly 
linear for each series. ‘The reader may check as an exercise the values given for the regression equations 
(b = 3a + 5) which are identical for the first two series. The slope (kya = 3) for the third is the same 
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TABLE 1 
Series I Series I] Series III 
x_— A KA 
a b ab a? a b ab a b ab a? 
4 17 68 16 8 29 | 232 64 9 37 | 333 81 
5 20 100 25 9 32 288 81 10 40 | 400 100 
7 26 182 49 10 35 350 100 12 46 | 552 144 
|) HSIN  __—__—>  _ —— > .—_____A A 
10 35 | 350 100 12 41 492 144 15 55 | 825 | 252 
Totals 26 98 | 700 190 39 137 | 1362 389 46 178 | 2110 | 550 | 
Mi=2657h..=3 Ne = 34257 kh, = 3 M, = 445; Roy = 3 | 
M= 3 Ces Mo os Cs M M5 C= 10 | 
b= 3a +5 b = 3a +5 b = 3a + 10 | 
I and II pooled (p = 8) I, II, 111 pooled (p = 12) 
A b ab a a a ee | E 
26 98 700 190 26 98 700 | 190 
39 137 1362 389 39 137 1362 | 389 
46 178 | 2110 | 550 
Total | 65 235 2062 579 Total | HH 413 4172 | 1129 
M, =200757k, = 3: M, = 34:42; (Roa ~ 3:44) ; 
M, =8105: LC =5, beuri M, = 9-25; (C2=2:6) .... 


as for the other two, but the origin (C = 10) of the distribution is different. This is equivalent to adding 
a specific factor F; = 5 to each B-score computed from the regression equation of the other 2 series. 
If we pool I and II, we obtain the pooled mean M, = 8-125, whence we arrive at the following result 
(Table 2): 


TABLE 2 
M, crude M, standardised 
I 24:5 24:5 — 3(6:5 — 8-125) = 29-375 
II 34:25 34:5 — 3(9-75 — 8:125) = 29-375 


If we now standardise the scores by reference to the value of M, (= 9-25) for the entire pool of 
data, we have 


M, crude M, standardised 
I 24-5 245 -- 3(6:5 — 9-25) = 32-75 
II 34-25 34-25 — 3(9-75 — 9-25) = 32-75 
III 44-5 445 — 3(11:5 — 9-25) = 37°75 


PeemEN TS OF ANALYSTS COP COVARIANCE AND OF FACTOR ANALYSIS 771 


Thus the standardised B-scores for the first 2 series of our numerical example are identical 
whether we standardise them for comparison with one another alone or for comparison with 
the third. The standardised A-score for III exceeds them by the series factor (F; = 5). This 


must be so, as we see if we write the equations 
b, = kha + C = 3a +9 +0; M,.,=3M,..1+ 5; 
by = kaas, + C = 3a +9 +0; M,.,=3M,.2+ 5; 
b = kha, + C+ F; = 3a +5 +5; M,.: = 3 Ma. + 10. 
Whence the standardised mean B-scores are 
L M,., — Meam a) = SMa .ı + -U a) = 3M, + 5. 
I. M,.¿ — 3(Ma.2 — Ma) = 3Ma.2 + 5—3(M,.,— Ma) = 3M, + 5. 
DI. M,., —3(M,.,; — M,) = 3M,.; + 10 — 3(M,.; — M,) = 3M, + 10. 


In terms of assessment of treatment we may sum up the foregoing remarks about standardisa- 
tion as follows. We suppose that we have before us, for each of several treated groups, paired 
values of the responses (b) of different individuals and of some correlated score (a). Our A-score 
means vary from group to group and our aim is to assess how far this circumstance suffices 
to account for the treatment group mean differences w.r.t. the response itself. In the absence 
of any residual source of variation, we may say that 


(1) group means adjusted by the method described above will be identical, if treatment 
per se has no effect ; 

(11) if treatment exerts an independent specific effect, being then such as to shift the origin 
of the regression from C to C + F, its influence will appear as an increment (or de- 
crement, if F is negative) numerically equivalent to F. 


In biological and sociological enquiry, it is, of course, impossible to exclude residual sources 
of variation affecting the responses of individuals or communities; but we can sometimes 
justifiably assume that their collective effect is random in the sense that positive and negative 
deviations from the regression mean resulting therefrom cancel out in the long run. In practice, 
therefore, standardising our data by recourse to the regression equation is unlikely to yield 
adjusted means which are exactly equal when treatment has no effect. What it can do is to get 
the meaning of the crude data into sharper focus. If the effect of standardisation is to reduce 
the group mean differences very noticeably, we have reason to suspect that the residual differences 
are attributable to random residual variation, being insignificant in that sense. Having removed 
the effect of the uncontrolled variable A, we have thus to ascertain whether the residual variation 
is still accountable without invoking the assumption that treatment is efficacious. This is the 
major objective of the statistical technique known as analysis of covariance. 


18.02 ANALYSIS OF COVARIANCE 


The need for a technique such as analysis of covariance arises in circumstances when : 


(a) we wish to determine whether some qualitative criterion of classification, e.g. treatment, 
significantly contributes to the variation of a score B, e.g. gain of body weight ; 

(b) we also have reason to believe that the score B depends in part at least on another 
variable A, e.g. food intake, which is not under direct control and is therefore inconstant 
w.r.t. groups distinguished by the criterion of classification and unlikely to have the 
same mean value in any two of them. 


12 


772 CHANCE AND CHOICE BY CARDPACK AND CHESSBOARD 


When it is not indeed possible to eliminate the effect of such variation a true effect associated 
with the criterion of classification signifies that the relation between A and B in different groups 
is not the same. If we can score A so that regression of B on A is linear, a true difference may 
show up in either or both of two ways : 


(i) regression is not uniform, i.e. the regression coefficients are not all identical ; 


(ii) there is a specific and group effect, i.e. regression lines do not have the same origin. 
For either or both reasons the adjusted means will in general be different. 


To assess the significance of adjusted means we thus need two different tests which bear 
directly on the issue raised above ; but the performance of either presupposes that we can safely 
assume regression within the groups to be linear. This would raise no new issue, if we were 
free to pick and choose our A-score values, as we can do when they are amenable to direct control ; 
but if so, we could design our enquiry without raising the problem we now face. Otherwise, 
the test for linearity based on (xiv) of 17.04 may fail us, because the F-ratio is indeterminate 
when each different B-score value within a group goes with a different value of the A-score. 

In laboratory practice, we shall rarely be concerned with comparison of more than 2 treatment 
procedures at once, but in certain types of trials it may be advantageous to deal with more than 
two groups of paired scores. We shall therefore regard the comparison of 2 groups as a 
particular case of a more general pattern. We may visualise the lay-out (Table 1) for 3 groups 
as below: 


scores 


TABLE 1 
Group I H III 
A B A B A B 
TE âii Bis Biv Sis 1.3 b.s 
02.1 boy Az.2 Do.o As.3 bs.3 
di bs. 3.2 Des | A3.3 Ds.3 
ars das. Gea, aes ba. | 
des bs... | | 
Means Mos M5. Mis M,.> | Mi-a | M,.: 
peat 
No. of 
paired pS bs =5 DP, =4 


The essentially new question such a table prompts us to ask is whether there is a group 
effect. If so, we may also ask, is this because regression is not uniform or because the group 
effect is additive if regression is indeed uniform as defined above ? It may also be useful in 
certain circumstances to refute the suspicion that variation w.r.t. the A-score per se contributes 
anything appreciably to variation w.r.t. the B-score. This calls for the addition of another 
test to the battery. The entire battery of appropriate significance tests is a sequence in which 
the answer obtained from one decides whether it is worth while to ask the next. We may list 
them in this order : 
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(i) if regression is linear is there a group effect of either sort ? 
(11) if so, is regression uniform from group to group ? 
(111) if so, is there an additive group effect ? 
(iv) is there true within-group regression ? 


The logical order of procedure is a little puzzling to the beginner to whose difficulties the 
practice of exhibiting it against a background of elaborate computations adds needlessly. The 
necessary computations for the tests are very laborious, and the arithmetical order of procedure 
involves short cuts which have nothing to do with logical precedence. It is permissible to 
wonder how any student first confronted with paradigms chosen from agricultural trials or the 
like can hope to emerge from such a maze with any clear conception of the framework of assump- 
tions relevant to correct application of the technique. 

As in the foregoing examination of significance tests for regression estimates, the procedure 
prescribed is: 


(1) to formulate independent estimates of the true variance of the putative common universe 
of e-score (residual) components with a view to the use of the F-test in accordance 
with principles by now familiar ; 

(ii) to employ as the denominator of such a variance ratio (F) a yardstick statistic which 
necessarily depends on residual variation (07) alone ; 


(111) to employ as the numerator of the F-ratio a statistic whose expected value will exceed 
o, if the null hypothesis is false. 


In what follows we proceed in the same way with this qualification. It may happen that 
we can formulate an independent statistic whose expected value will be less than that of 
the yardstick statistic if the null hypothesis is false. To use the F-table intelligently we 
must then employ the former as the denominator of the F-ratio and the latter as the 
numerator. 

Notation. In defining appropriate expressions for the numerator or denominator of the 
F-ratio, we have had to assume that we are sampling in accordance with the principle of the 
fixed-A set. In this context, the principle presupposes a doubly stratified universe, since we 
have to assume that the A-score distribution is fixed for each set of paired scores as in the sample. 
If this is clear we may drop the subscripts c and s except when we need to distinguish the true 
value (R;,,) from the sample value (k;,.,,) of the regression coefficient. Our code will be as 
follows : 

Pooled Sample Within the kth sub-sample 


No. of paired scores . P Pr 

Mean A-score . ; ; ; M, + A 
A-score Variance l i ; e woe 
Mean B-score . M, Mo. 
B-score Variance > : s V, Va. 


For h sets of paired scores we may designate the operation of extracting a mean value as 


O IN 


k=1 


E l 
2 p BPE ‘) =p —ch and p.B{—) =h ao ay es 


k 
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For the operation of extracting the within-set mean of all p, values we may likewise write 


u= Py 


È (== Ed.) 


For the expected value of a parameter W within the fixed-A set as a whole we shall use E,(W), 
so that the expected value of the mean e-score variance within the set is 
Le . M(V.. h) — E, . E (Ve. n) == E, . LAY «. n)» 


s. B, MV.. n) = BÈ 02) 0 ERC E 
h 


To complete our code we must make explicit the putative components of the B-scores. 
If there is uniform regression and no group effect associated with our qualitative criterion of 
classification, i.e. all sets of paired scores come from the same bivariate universe, we may write 


b=e+F,4+ C. 


If there is a group effect either F, or C varies from group to group, indeed both may do so ; 
and we may distinguish 


b=e+F,+C, regression uniform, additive group effect present ; 
b=e+ Fa.n +t C regression coefficient variable from group to group, no other group 
effect ; 


b=e+F,.,+C, regression coefficient variable and additive group factor present. 


If regression is also linear we may write F, = kpa. a or Fy.» = Roa. a. a in the above, as the 
case may be. The accompanying table of B-score components (Table 2) fills in any essential 
gaps. By reference thereto we can at once derive a result which will clarify subsequent reasoning. 
When h sets of paired scores come from the same bivariate universe, we have before us A paired 
mean A-scores and mean B-scores ; and we may define in the usual way a coefficient of regression 
of the mean B-score on the mean A-score. The expected value of this coefficient (Rm. ¿) is the 
true regression coefficient (kpa). This is deducible from the following considerations. If 
regression is linear and uniform, in the absence of a group effect 

Moa Meg ta Maat GC 

Cov (Ma, My) = E Ma. n — Ma)Mo. n 
ane EX Ma. n — JM, h A kya. EX Ma. a — i oe 
In thisřexpression 
E,(M...—M.)M..2.= EME. 1) — MEM, . 2) E — Mi = V(Mo. 1); 

Cov (Mas Mi) = E (Ma. ERE a)Me. a + Rea: V(M.. n). 
The expected value of the first term on the right, being the covariance of the mean A-scores 
and the mean of the independent residual e-scores, is zero and within the fixed-A set : 

E,. Cov(M,, M,) = Roa - VM n)- 


If Ron. ¿ is the observed sample value of the regression coefficient of the mean B-score on the 
mean A-score : 
| e Cov (Ma, Mo) 
mMm. ê V(M, f i > 
so Eig Base) = oe : : ; : >) 
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Meaning of Parameters. We have already seen that it is possible to split the variance of the 
B-score distribution into two components, of which one depends on error variation alone when 
regression of B on A is linear. The hypothesis that all our p paired scores come from the same 
bivariate universe in which regression is linear thus implies 


EQ — ro.» = = GG : ; : Tv) 


We may thus define as follows a statistic whose expected value will be of if regression is uniform 
and linear, no group effect being present : 


S 
a S, = (1 — rab - Vo . . s (v) 
Now we have before us also h paired mean scores from which we can derive the regression 
coefficient km. ¿ of (111) and the corresponding correlation coefficient 7m. Within the framework 
of the foregoing assumptions, we can therefore obtain an unbiased estimate of o? from the 
statistic (1 —7?)V(M,.,). The form of this statistic is deducible from the build-up of 
(1 — 72,)V,, which we may write as 


(1 TA ræ) Vo s oe V, = V, =V, + kia E 


ELI =G 4 = 


Similarly 
(1 — ra) (Mo. 2) + Tí V(Mo. n) = V(Mo. n) + Fia - V(Ma. 2). 
In this expression 
(1 — VM. n) = VWM.. n) = Ve — M(V.. a). 
Now we may write 


E,. MV.. n) = En. EdV a. n) = p2 ra "of 
k 
P TR h 2 


O, - 


e. Te . MV. n) = 


Whence from (iv) 
p-2 , p-f IN 


Edl nex Tm) V (Mo. h) e PA ti ee pa % = ° ° (vi) 


Within the same framework of assumptions, we may therefore define an unbiased estimate 
of o by the relation 


S 3 
e A => ni S = (1—2. a - ; . (vii) 
Within the groups, linear regression implies 
— 2 
EA) — roy Vo. = m 0%: 
Pr 
Whence from (i) above 
2 P — E 9 e 
E... MA — Tin.) Vo. n = E. o; j ; . (vi) 
Accordingly, we may define a third unbiased estimate of of by the relations 
S ; 
ELD =}; d= "5 S = MU — KIA i 2 ee) 


p — 2h 
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The expected value of the statistic (sî) depends neither on the assumption of uniform 
regression (kpa. n = Roa) nor on the assumption that there is no group effect (C, = C). If 
regression is uniform, we may indeed obtain a fourth statistic whose value does not depend on 
the existence of a group effect by using the square (7%, m) of the mean within-group correlation 
coefficient for the mean square M(r2, ,) of the within-group correlation coefficient in (vii). 
We define 7”, „ as follows : 

a  _ [MCov(a, dF 
obo = MV, MV...) 
[M Cov (a, b)]? 
M ( rs . n) 
Thus the statistic whose expected value we shall now determine is 


S, ; 
MA ra Vo. n= y = MV 0.2) — Tág mo M(Vo. 0) e - (xi) 


“Ps MV.) = ie So a) 


In this expression uniform linear regression implies 
M(V,. n) aa M(V.. ») + k. . MV.. n). 
Whence within the fixed-A set from (ii) 


E,. MOS a “4 Pee Mes RS ei 
In (x) above, 
M Cov (a, b) = M Cov( a, e) + ksa . M(Va. a), 
[M Cov (a, e)]? 
M(V a. n) 
In this expression the last term, the coefficient of the covariance in the second term and the 


denominator of the first are constants of the fixed-A set. Since the expected value of Cov (a, e) 
is zero, we may therefore write 


Ss MV) = + 2kya . M Cov (a, e) + kè, . M(V,. n). 


E.[M Cov (a, e)]? 


2 
MV...) + kga- MV 4. n) 


E, . ee . M(V.. n) m 
Whence from (xii), if regression is uniform, 
p—h , E[M Cov (a, e)]* 


C 


y MVoa.») 


To evaluate the second term on the right it will be convenient to put M . Cov (a, e) = Q, 
so that E(Q)=0. By definition of variance, we may therefore write EQ?) =07. If 


Zm = Cov(a, e) within the mth set: 


M Cov (a, j= 0 = E —— Sa. 


(xiii) 


EA. — Ti mm) - M(V5.0) = 


Each component Zm of O is independent of any other, being referable to different sub-universes. 
Hence if o? ,, is the variance of the distribution of Zm, 


m=h p2 
E,[M Cov (a, e]? == > ae O 
1 


m=i 
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In accordance. with (v) and (x) of 17.03 
da aaa 


on a 
A am Le 
; PE r aL )= we 


Whence from (xiv) 


2 
E, [M Cov (a, e)]? = 2 M(V,. »)- 


Thus from (xiii) above 


—h-—1 
EL — t.m) - MV a. 1) = oe ee 
We may thus define a fourth statistic which is an unbiased estimate of of if regression is uniform 
and linear regardless of the presence or absence of a group factor : 


S, : 

ns ie O o a 0) 

We can combine S, of (i), S; of (ix) and S, of (xvi) to obtain other statistics which are 
unbiased estimates of o; on the assumption that 


en s 


(i) regression is linear 


Ss 


ES) =e a 7 o S; = Sı — S; ; i . (xvii) 
(ii) regression is linear and uniform 

Ea ==: TER Se = Sy — 8; ; ; . (xviii) 
(iii) regression is linear and uniform, no other group effect 

Es) A a E S,= S,— Si UN) 


In specifying the assumptions subject to which the several statistics defined above are 
unbiased estimates of o7, we have not indicated what is of pivotal importance if we wish to 
prescribe an F-test in accordance with the procedure outlined above. Of those defined by the 
foregoing equations, sí being referable exclusively to variation within the group is an unbiased 
estimate of o? whether regression is uniform (kpa. n = Roa) or not and whether there is or is 
not (C, = C) a group effect involving a shift of origin of the score distribution. It is thus the 
fundamental yardstick statistic ; but if we are content with the outcome of a test of uniformity 
based thereon, we may proceed to use sí as a yardstick statistic. The effect of variation among 
the values of 7a». , will be to make the expected value of the square of the mean within-group 
regression coefficient (74».m) greater than it would otherwise be. We may therefore write 


regression linear and uniform E(s,) = E(s;:);  Elsg) = E(s4). 


regression linear, not uniform E(s,) > Elss); Els) > E(ss). 
a 
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The statistic sí of (iv) being referable to (1 — 1?,)V, will diminish if the set-up is such as to 
increase the expected value of the common 7,,. Either differences of the regressions inter se 
or an additive group factor will make the latter greater than otherwise, whence we can write 


regression linear and uniform, no additive effect: E(s;) = E(s3); E(s;) = Els) ; 
regression linear with one group effect or both: E(s5) < E(ss) ; 
regression linear and uniform with additive effect: E(s,) < E (sa). 

This leaves us with sz which depends on the regression of the paired mean scores. Like 
sj its value depends on variability of both regression coefficients within groups and the presence 
or absence of an additive group effect. Either sort of group effect will diminish its value. We 
may therefore state 

regression linear and uniform without additive effect: Els) = Els); 
regression linear and uniform with additive effect: E(s2) < Elsa). 

If we have reason to ask whether there would still be a significant correlation between 
A and B in the absence of a group effect we may confine our attention to the fraction of total 


variance which is not affected by either sort of variation which may arise from the group classi- 
fication, viz. M(V,.,). If there is no within group regression, we may then write 


Ss 


rey Ss = Sw.» = p. M(V,. n) in Table 3 . (xx) 


E(s2) es 0% > s3 ee 


Thus we have 
E¿(s3) =e o 53 = Ss ware Wig Ta? ° M(V,. a) . . . (xxi) 


We have now all the requisite statistics of the battery of tests outlined above, and may 
proceed to define an F-ratio based on two statistics whose expected values are identical, if the 
null hypothesis is correct, choosing as the numerator the one whose expected value must be 
greater if the same hypothesis is false. The proof that they are independent Chi-Square variates 
follows the familiar lines set forth in Chapter 16 and in 17.04. 


(i) Is there a group effect of either sort ? 
eS Sa  2h-—2 


Fy = 2 gar nie) nh Se ` : : =. (ext) 
(1) Are the within-group regression coefficients identical ? 


5 Sa — S p — 2h 


Pa => 2 hoe oe . : : UL) 
(11) If regression is uniform, is there an additive effect ? 
5 S, h— 1 
Fk, = = = ———z == ; : , ; i 
a E a (xxiv) 
Alternatively (for confirmation) 
E — 2 
Fis — de E Ss . . . . . (xxv) 
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(iv) Is there regression not attributable to group effect ? 
s (S,— S, —h=—1 
„=$ &-90-4—) = 
s4 4 


(Ss =e ice in Table 3). 

Note on Computations. 'Table 3 is a key to the appropriate sums of square deviations and 
products of deviations from the relevant means. As in 17.02 above, it simplifies machine 
calculations if we sum.5 columns for each block of p,, paired scores : 


Saca ane eae 


total raw scores within block ; 


Sarr AMA. Rees total squares within block ; 


Seer total products within block. 


We may write the corresponding grand totals for the p paired scores of all the h blocks as sa. o, 
oe. Sa o and Sar. o. We then have 


2 
$ $ 
a.w, ea b.w sis 
Dane Las os > dra = A ~ er . rat è (xxvii) 
Pw w 
2 
S S 
a.0, b.o SEG 
Sas A Spee ee Sop o — Sbd-a@ AE . . ° (xxviii) 
P P 
Sa.w-Sb.w Sa. o- $b.o 
Sab w Sab. w ) Sse = Site? 
Pw 


— . XXIX 
> (xxix) 
The statistics embodied in S, are obtainable from the grid tautologies of 11.05 and 11.06, vzz. : 


V = M(V) + V(M); Cov (ab) = Cov(M,M,) + M Cov (ab). 
Thus we have 


Sat a A Sn Ons. 6 Den. a : ` (xxx) 
r PE OD: S E . e (XXX1) 
We may, however, use this relation as a check-up, if we compute directly 
w= h 2 2 
S s > 
Saa. m = eaa (xxxii) 
w=1 Pw P 
w=h 2 2 
s s 
b. b. E 
Soo. m = ee : . (xxxiii) 
w=1 Fw P 


(xxxiv) 
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Numerical Example. 
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The table below exhibits three blocks each of six correlated variates a and b 


with corresponding squares and products for purposes of computation with totals at the foot. 


Group I 
: 
a b ab a? b? a 
2 13 26 4 169 3 
3 17 51 9 289 6 
5 21 105 25 441 9 
6 27 132 36 484 11 
8 25 200 64 625 12 
12 34 408 144 1156 13 
36 132 922 282 3164 | 54 


SS |S | ES | SE o | 


TABLE 3 
Group II 
ab a? b? 
60 9 400 
1507" 36 625 
261 81 841 
385 | 121 1225 
444 | 144 1369 
481 | 169 1369 
1781 | 560 | 5829 


Group III 


a b ab a? b? 
7 3 21 | 49 9 
gb oy 32 | 64 16 
11 | 12 | 182 | 121 | 144 
13 | 17 | 291 | 169 | 289 
15 | 19 | 285| 295 | 361 
is | 26 | 468 | 324 | 676 
72 | 81 | 1159 | 951 | 1495 


Below we derive the appropriate sums as set out in Table 3 (Key for Computation) above. 


Group I Group II Group III 
Daa. w 66 74 - 88 
Dosw 260 247:5 401-5 
r ee 130 134 137 
A m ~ 108 Sor m 867 ee — E 153 
aa o> 336 Sob.o = 1776 Po T = 298 
AR p = 228 Sob. p == 909 a == 451 
2 2 
Sie 1776 i as S = 867 — (153)" ~ 650: 
108 
(130)? (134)? | 
Se = — 5 — 1:5 — —=| x= 13; 
3 (260 66 + | 247:5 74 + |401:5 38 
(451)? 
= — a 17: Se A 
S, = 909 998 I: 18: 3 
Whence we have 
; 1512 — 13 12 
(i) F = O e A 13 = 345. 
= 17 — 13 12 
(i) Fo = 5 ae 1-84. 
1512 — 17 14 
iii) Eo, = —— .— Y > 
(iii) Fo, 2 17 616 
650 14 


The reader may find it instructive to investigate the approximate relationship which subsists between 
the a and b scores in each group (reference to the column totals provides a clue). 
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18.03 CAVEAT TO ANALYSIS OF COVARIANCE 


In another context, we have had occasion to remind ourselves that statistical theory provides 
no sufficient substitute either for common sense or for an intimate knowledge of external nature 
variously denominated natural history, clinical experience, intuition and good judgment. It is 
especially important to keep this truth in full view, if we are to assess the value of analysis of 
covariance as a tool of research. In appropriate circumstances, the results of its application 
may be highly suggestive and helpful. It is not an open sesame to all closed doors between 
ignorance and knowledge when the end in view is to assess the relevance of quantitative and 
qualitative putative sources to a particular type of variation. 

At the outset, it is necessary to emphasise (as stated in 18.01) that appropriate design of 
laboratory experiments (as opposed to field trials and industrial experimentation so-called) 
commonly offers a more direct and satisfactory approach to the issue which is the peculiar concern 
of the technique under discussion. While there may admittedly exist circumstances which 
make a putatively relevant quantitative source of contributory variation difficult or even im- 
possible to control in an experiment conducted to evaluate the significance of a second and 
qualitative criterion of classification, it is also true that such circumstances commonly exclude 
the possibility of taking precautions to assess the validity of the assumptions inherent in the 
method of 18.02. 

The tests dealt with in 18.02 are conditional on two assumptions. One is that regression 
is linear. The other is that the residual variation of which o? is the measure is the same for 
all sub-universes. ‘The second we can test, if in doubt, by methods mentioned elsewhere. 
Indeed, the test for uniformity of regression answers the question, as far as it is possible to give 
an answer to it, if there is no reason to dismiss the hypothesis that there is uniformity of regression. 
The assumption of linearity is not one which we can commonly and conclusively justify in situa- 
tions which compel us to fall back on the analysis of covariance as an alternative to a more direct 
procedure. The reason for this is one we have noted elsewhere (pp. 741 and 742) en passant. 
The linearity test of 17.04 breaks down, unless we can arrange matters so that we have more 
than one B-score value for at least some of the A-scores, as we can ensure in certain types of 
experimental design. In the type dealt with in 18.02, we have in fact to take our B-scores as they 
come. | 
For both the reasons last stated, it is important to be quite clear about the credentials of 
the claim that statistical tests such as the Gosset t-test and any test based on an F-ratio permit 
us to make assertions with confidence about small samples. Formally, and in accordance 
with our initial assumptions (e.g. normally distributed scores), it is true to say that such 
tests, unlike tests in common use a generation since, rely on the distribution of sample values 
of the parameters of the relevant distribution, in contradistinction to the distribution of ratios 
involving the unknown parameters which we can at best estimate with assurance for very large 
samples. On the other hand, it is necessary to remind ourselves that a significance test of the 
sort under discussion can merely give us a rule for dismissing a null hypothesis without risk 
oí doing so wrongly very often. It cannot give us good reasons for believing it, though we may 
indeed have derived reasons from other sources ; and a sample, if small enough, may give the 
test rule little chance of dismissing a null hypothesis which is false. 

To the author, the moral of this is clear. If we have before us large sub-samples in the 
set-up of 18.02, we have good enough reason to justify the conviction that regression of the 
B-score on the A-score is approximately linear. In any case, our data may be such as to exclude 
the possibility of checking this assumption by recourse to an appropriate significance test ; 
and in any case, the test does not conclusively prove that the null hypothesis is correct. In 
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practice, therefore, analysis of covariance or any other technique conditional on an assumption 
(e.g. linearity) which we have not good reason to adopt for reasons other than the outcome of 
a statistical test, cannot base its legitimate claim to consideration on the economy of working 
with small samples. 

For a reason stated in 17.06, it may also be legitimate to express some doubt about the wisdom 
of assuming that the analysis of covariance is a reliable tool of research in the concurrent domain 
of sociology and econometrics. Within the consequential domain of agricultural field trials or 
nutritional science, the meaning of the significance tests is clear, and the implications of the 
principle of the fixed-A set—Churchill Eisenhart’s Model I approach (p. 548)—present no 
semantic difficulties. It is not equally clear what the Model I approach signifies in the unique 
historical situations of sociological enquiry. 

One other consideration bearing on the judicious use of Analysis of Covariance calls for 
comment; and is on all fours with a limitation perhaps too little emphasised in connexion 
with the parent technique of Chapter 13. We may speak of it as the dilution of the class effect. 
The method of 18.02 is, of course, applicable to situations in which we distinguish only two 
classes of paired scores. When the number of classes is large, there is always a possibility 
that the effect of others will conceal one which is out of step. For this reason, it is a wise 
precaution to calculate within-group correlation coefficients for comparison, and separate investi- 
gation, if the figures are suggestive. 


18.04 THE CONCEPT OF FACTOR PATTERN 


In different contexts we use the word statistics in several ways, the connexions between one 
and the other use being various and somewhat exiguous. Originally, it signified numerical 
information bearing on the affairs of State; and it is well to remind ourselves of this when we 
examine the controversies provoked by the applications of factor analysis in the field of psychology. 
Factor analysis is a statistical procedure largely developed in connection with what is basically 
an administrative issue, i.e. personnel selection. To trace its origins we must go back to work 
undertaken in the nineties by Binet and Simon with a view to devising tests of intellectual 
aptitude with more prognostic value than scholastic examinations. At the start, the yardstick 
of such so-called intelligence tests was their correspondence with teachers’ estimates of relative 
ability. As we have seen in Chapter 6 of Vol. I, this is essentially a problem of rank correla- 
tion. The pioneers of test-work explored from this standpoint a variety of puzzles, the general 
knowledge quiz and feats of memory presumptively unrelated to school training. There thus 
emerged a mass of information about the intercorrelation of test scores. 

Intercorrelation of two test scores in this setting signifies a correlation—rank or product- 
moment—based on the application of each of them to each individual of a group. In the idiom 
of our 2-face card pack model, of 12.00, the individual is the card and the test score is the number 
of pips on one or other face. In that of the Umpire Bonus Model, the individual is the particular 
trial and the test scores are the scores of the players. The fact that the results of say six tests 
of the same individual tie-up in the sense that there is a significant positive correlation for any 
pair of corresponding scores applied to the same, or to a comparable, group of individuals does 
not necessarily mean that they all measure the same aptitude. This will still be true if: 
(a) each test measures a congeries of unconnected attributes ; (b) the results of any pair of them 
depend in part on one such attribute which does not affect those of any remaining test. Hence 
arises the following question: is there any characteristic common to what we are measuring 
when we apply different so-called intelligence tests ? If so, and if there is indeed only one such 
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sort of aptitude, it may be a verbal convenience to adopt the convention of restricting the use 
of the word intelligence thereto. 

The query last stated assumed a more provocative aspect when Spearman first propounded 
what it is customary to call the hierarchical principle embodying a feature of the lay-out we 
may meet when we arrange such test-score intercorrelations in a symmetrical grid. His inter- 
pretation embodied in the concept denoted by g (for general intelligence) started a controversy 
which led to the recognition of more complicated patterns. Factor analysis is the attempt to 
interpret them. At the outset then, let us be clear about what a factor pattern signifies. We 
shall assume that we have before us the results of applying 5 tests (A—E) to each boy or girl in 
a school form, and that we can therefore calculate the product-moment index of any particular 
pair. Having done so we may set them out gridwise as below on the left. ‘To illustrate the 
meaning of the simplest kind of hierarchical pattern, we shall suppose that the numerical values 
are as shown on the right. 


A B G D E 
| | A B C D E 
A |— Tab Vine Tad Fas A — 0-44 0-61 0:53 0-73 
B Y ad a Toc Tod Tbe B 0:44 agen 0°36 0-29 0-39 
Clr. fi s 3 E Bo pe her E 0-43 0-55 
D Vad Toad Ved Sess Yde D 0°53 0-29 0-43 ee 0-50 
e ro a Fa ee AS ee oW 


A close inspection of the figures shows that there is a rank correspondence, which comes 
into focus when we rearrange the items as below on the left. This hierarchical correspondence 
is not the only circumstance the figures suggest. Closer inspection shows that the ratio of items 
in any two rows of one column is roughly the same as that of corresponding items of the same 
two rows in another column. That this is so becomes evident, if we assign suitable border 
factors to each corresponding row and column, expressing the cell entries as their products. 
Thus we can closely reproduce the actual correlation matrix on the left below by multiplying 
factors assigned to heads of columns and row margins on the right: 


A E C D B 0-9 0-8 0-7 0-6 0-5 


A A 


The lay-out of the figures of this fictitious example illustrates the simplest type of factor 
pattern, i.e. the hierarchical or single factor pattern for which Spearman first offered a theoretical 
interpretation. How later test results suggest more complex patterns the following actual results 
cited by Burt will serve to illustrate : | 
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Composition Handicraft Spelling Drawing Reading Writing 


Composition — 0-30 0-49 0-38 0-58 0-44 
Handicraft 0-30 — 0-09 0-50 0-10 0-28 
Spelling 0-49 0-09 — 0-12 0:46 0-25 
Drawing 0-38 0-50 0-12 — 0-13 0-36 
Reading 0-58 0-10 0-46 0-13 — 0-21 
Writing 0-44 0-28 0-25 0-36 0-21 — 


A pattern, suggestive of a hierarchical relationship between a sub-group of this battery 
of tests, emerges when we rearrange the figures as below : 


Composition Reading Spelling Writing Drawing Handicraft 


Composition — 0-58 0-49 0-44 0-38 0-30 
Reading 0-58 a 0-46 0-21 0-13 0-10 
Spelling 0-49 0-46 — 0-25 0-12 0-09 
Writing 0-44 0-21 0-25 — 0-36 0-28 
Drawing 0-38 0-13 0-12 0-36 — 0-50 
Handicraft 0-30 0-10 0-09 0-28 0-50 -— 


To what extent it is possible to interpret such a pattern as the above with confidence, we 
shall discuss at a later stage. First, let us examine the rationale of Spearman’s hierarchical 
criterion of the single factor pattern. Spearman himself, and many who have followed him, 
relied on reasoning which involves an unnecessary and not necessarily true limitation, viz. the 
assumption that linear regression is itself a necessary consequence of linear concomitant variation. 
On the other hand, our exploration of the Umpire Bonus Model in 12.01 opens the door to a very 
simple derivation of the hierarchical criterion, if we assume with Spearman that the test score 
of an individual is the algebraic sum of independent bypothetical components. The single 
factor pattern for test scores x,, X» etc., then takes the following form, x,, being the common 
component and x,.o, Xa.o, etc., the specific one : 


Xa, SS gs HA Me 
x, = Bu. Xu + Bo. x. 63 
Li + Do 


Strictly speaking, the assumption of the statistical independence of the two score com- 
ponents is unnecessary. All we require to assume is that their covariance is zero. The same 
is true of the score components in the Umpire Bonus Model set-up. 


18.05 DERIVATION OF THE HIERARCHICAL CRITERION 


The foregoing system of equations definitive of the individual test score are formally identical 
with (vi) in 12.01, the common factor being the umpire’s bonus. For this set-up, we have 
established the following relation specified by (ix) of 12.01: 


Yao = Tau Tou 
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Given that all five players receive a single bonus from one and the same umpire, we may 
thus lay out the correlation matrix of the intercorrelations between their scores in such a way 
that each cell entry is the product of the two border factors respectively definitive of the p-m 
coefficient of the umpire’s own score with that of one or other player. This establishes the 
conclusion that the single factor hypothesis is a sufficient condition of the hierarchical pattern ; 


but we have still to show that it is a mecessary one, i.e. 


the only feasible explanation. 


Before exploring this issue, let us notice a principle inherent in the hierarchical criterion. 
If we consider any four players of a set of this sort, we notice that we can pair them off thus 
AB.CD; AC.BD; AD.BC. The products of the intercorrelation of the scores of one 


pair with that of the residual pair are then as follows : 


2 2 


Oy, Oy, 
OaT p 0.07” 

o? o? 

u u 
Tac > Toa = AC a Do > 
Oae 7704 

2 2 

Oy, On, 

Vad + Voe = A,B, E Es 

040 q OOc 


Hence we arrive at Spearman’s tetrad equations, viz. : 


2 
= ABRES r 
0 ¿TO Ta i 


2 
sAn 
07050 Ta 


A,B,C,D,V? 


040400 q 


Pad Ta — 9; 


EA 


Thus a single factor pattern implies that for any 4-fold set of test scores the tetrad differences 


do not significantly differ from zero. 
law of the composition of the scores. 
law for the same model, viz. : 


In deriving this result we have assumed a strictly linear 
Let us now examine the consequences of a non-linear 


Xa = A,X + ApXa.0; 
X, = B,X2+ BAXA5..; 
Xe = Cy Ag + CoXe.o3 
Xa e OX, aa ee ee 


We may then write 
Cov (2a 2) = HX. XP) =A BEGETS 


E A a AR 
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If there is zero covariance between all powers of the two score components and m,,, is the 
(p + q)th mean moment of the umpire’s score distribution, we therefore have 

Cov (ty, Me = AB ELNO 9 = AR Maye 
Similarly, 


COIE AL MR 
We thus derive 


AE lt ; 
Tay «Tea = ——— + My qs Ms +83 
0 ¿070,0 q 
O 
Tac - oa = ~My +52 Magt; 
Oah cO 
Aleu 
Taa- Tig Oe Neg e TM” 4 yo 
Oah cO 


Thus the tetrad differences will vanish if, and only if, 
Mos q+ Mst = Mp5 - Mart = My tz + Mats 


This will be true if p = q = s =f, in which case the law of composition is linear since we may 
replace X?, AL etc., by Z,,. | 
If two umpires (U and W) contribute to the total score of each player, (xii) of 12.01 prescribes 
that 
Tab = Tau - Tou + Taw Tow, etc. 5 


Oy Ow 
Tou = Ay— 3) Taw = Ay, Bt, 
Og Og 
Whence we derive 


Tab + Toa = (Fah, -You aa Taw - Tow) (Foz > Tau + Tow + Taw) etc. 
Our tetrads thus reduce to 
Tab e Vea = Taul bul cul au ER Pao Y ow! cof aw + Tau oul cw! aw => Taw! owl cul du > 
Tac+ Toa = Tau bul cul du -F7 Taw owl cw! aw “te Toul owl cul aw A Taw! bul cw! au > 
Yaa o Voc = Tauf bul cul au a Taw! ow! cul aw + Taul ow! cul qu $ Taw! dul cul awe: 
The first two terms in each of the above are identical. So the tetrad differences will vanish 
only if 
Yau - Tous Tews Taw F Taw: Tow: Tous Yau = Yau » Ycu + Tow + Taw -+ Taw: Few You Yau 
= Tau + Yau - Tow + Tew F Yaw: Taw: Tou - Touw 
From the first pair we get 


Tau - dr «Few — Tow. You) = Taw - Paul Pou -Tow — Tow. Fa), 


Tau Yau 


By pairing off each of the three identities, we thus get 


nn me OOO aĖ— 


Tau ates Vou Tou Yau K 


Taw Tow Tew Y aw 
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Hence we may write 
bare tee. Ie) i O 
This will be true if 


A O O SB EK, +B; Xa, ote 
AX LA AAA BY + BX, = BEEN eto. 


From an algebraic viewpoint, we may therefore say that the tetrad identities are valid for 
a set-up involving 2 umpires only if the composite bonus is replaceable by that of either alone 
with appropriate change of scale. This resolves a controversy concerning how far we are entitled 
to regard Spearman’s g as a single entity. ‘The answer is that it behaves as such in the Spearman 
framework of test material, in the sense that the carbon atom behaves as a unit in the customary 
manipulations of chemical analysis. ‘This does not exclude the possibility that it might behave 
otherwise in a different framework as does the carbon atom under the impact of radioactive 
emanations. 

With the qualification last stated, we can say that the single factor postulate is both a sufficient 
and necessary condition of a strictly hierarchical pattern, as defined in 18.04. If we now fill 
in the empty diagonal cells of our correlation matrix by the appropriate entries, viz. : Toy Thus 
etc., each such statistic has an intelligible meaning vis-à-vis our bonus model, i.e. the correlation 
of the player’s score with that of the umpire. We have yet to give these diagonal cell entries 
a meaning in the domain of factor analysis ; and we can do so, if we go back to our model, 
recalling that 

Vig A Vos 
o HT 


ER 8 
The first of these two equations exhibits the breakdown of the variance of the A-score distribution 
into 2 moieties, one we may call V, the part referable to the bonus which the players have in 
common, and one V, the part referable to what each player records as his individual score. 'Thus 
the proportionate contribution of the bonus to the total variance is 
HA, F 


PE 


F es 

In terms of factor analysis, each hypothetical diagonal entry of the test-score correlation 
matrix when completed is therefore the proportionate contribution of the hypothetical common 
factor to the variance of a test-score distribution. ‘To proceed further, we may simplify our 
problem by assuming that our test scores are standard scores as in 12.07, and that the total 
variance of the test-score distribution is therefore unity. 


18.06 RELIABILITY, ATTENUATION AND COMMUNALITY 


In the real world, no performance test is perfect in the sense that successive applications to the 
same individual would yield absolutely identical score values ; and we may take stock of the 
implications of the imperfect reliability of any test by a simple physical analogy to the Umpire 
Bonus set-up. Within certain limits the law relating the load to the extension of a spring is very ` 
closely linear, and we may suppose that we apply to each of two springs A and B the same load 
at the same trial and different loads at different trials, the test score (x, , X») being the length 
of the spring. To apply the test we use a vernier which is fallible like all instruments, and we 
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may therefore regard the test score as having 2 components: (i) its true value proportional 
to the applied force (x) which varies from trial to trial in accordance with Hooke’s law; (ii) a 
random component x,; so that 


Xq = Ay. Xy + Loa and x, = B,. x, + Loa. 


If we use the same vernier, the variance of the distribution of the error component is the same 
throughout, and 
Se eee ae 
(A; V; F F3) (B; V; AS V.) 


With this analogy to the Umpire Bonus set-up in mind, we may now formulate the com- 
ponents of two composite scores recorded for the same group of persons in the following terms : 


(a) one (x,) which assesses a common attribute ; 
(b) one (Xsa or Xs») which assesses a specific attribute ; 


(c) one (Xea Xe) which refers to error of observation, or what is indistinguishable therefrom, 
i.e. uncontrolled circumstances affecting the response of the test subject. 


We then write our scoring system in terms of score deviations as 
Bg pe, Aua ae 
X, = B. X; + Bs. X ao + Xov 
The error and specific components have by hypothesis zero covariance, and they will be indis- 


tinguishable if we apply once only each composite test to each member of the test group of 
persons, so that we might then write 


Ag. Ya o = A. 4a +e, amd B..x.. ¿=D 


for the components peculiar to each test. Otherwise, we must write 


an AFBSV; a 
ee. (AV; 4 A? Fa F Ea) (BFV; + BV sy +H V 2») E ; 


In the absence of error, we might write the true correlation (r,,..) in accordance with the 
implicit assumption V,, = 0 =V,,, so that 
212772 
o A;B iV; e 

ab, 2 

: (A; V; + A Fa) (BV, a E; V s») 
oe <> A V; ae Asa 53 2 BV; E BV sy ir Vo, (ii) 
Tab A; V; a A Fas B; V; a5 B; Vs 
Let us now suppose that we repeat the same test on each person, so that we may correlate the 
first composite test score of each person with his or her second. In this set-up x,, and x; are 
each common components of the two scores at the same trial (i.e. the two scores of the same 


person) ; but the error component differs from trial to trial. For test A we may therefore write 
our equations as 


Xai = Arx e A satsa + Xa; 
Xq = A yxy + y pe + X ep. 
The situation so described is one in which x; and x,, correspond to the independent 


contributions of different umpires and x,,, x,» correspond to the individual components Xa. o 
and x. of our bonus model. On the assumption that errors of a given magnitude occur with 
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equal frequency on both occasions V, = Vea = V.o Thus we may write the product-moment 
coefficient of the test-retest as 


Ae + AV AA = 
= ee TV a 
Similarly, we may write 
fs BV, T BVa Rie. BV; + B?V s (iv) 
ce BV; qe BF a F Va Pe V, À ; E i 


By substitution of (iii) and (iv) in (ii) we therefore obtain the so-called correction formula for 
attenuation, viz. : 


AE gr B e E l ; : E 
Taa+ Top VT aa a 
In this formula, 7r,,., is the expected value of the product-moment correlation between A and 
B test scores in the absence of errors incident to the carrying out of the test, ra» being its crude 
value. We may speak of 7,, and Ya as coefficients of reliability, since their value is unity if there 
is perfect test-retest agreement, in which case x, is constant and V, is zero. 
In the set-up under consideration, we may label the components of our test-score devia- 


tions, thus 


A 


TEST SCORE Communality Specificity Error 
o = AX; +A xX sa +A.Xea 
Xa = B; X; | + BX sp + BX ep 
RELIABILITY ERROR 


In this set-up the correlation between tests A and B assessed by the p-m index (7,5) of the 
test scores x,, X» is comparable with the correlation of the scores of two players who receive 
multiples of the same bonus from one umpire in the model of 12.01. If we could actually isolate 
the common factor (x,) equivalent to the umpire’s bonus, we could then record ray or fas the 
test-score factor correlation corresponding to the correlation between the umpire’s score and 


that of the player. Formally, this is 


= B?V. 
Aa and Yo = 7 


a 


The corresponding components of variance for test A are thus : 


2. $ 
Error AV ae E Feu 


Source Actual Proportionate 
Total Va 1 | 
: AV E | 
Communality Y 1 af 
o 

Specificity AV sa Pao Fay | 
| 

Reliability AV; + AgV sa E | 
| 
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Let us now return to our equations exhibiting the three additive components, and express. 
them in standard form, i.e. 


a 


Sa == RO etc. 
Og 

Thus we may put 

Aa fit Asop A; + AS Ai i A Psa an 

Ca Oa Oş Oa Osa Oa Teq 
We may then write 

Apor eee AG sa sis ; Aa ae 
= a; ; = ay. oo eae 
Oa Oa Oa 


The definitive equation of the A test score thus becomes 
Za = AfZf F AZ sq F aeea 
The variance of the distribution of our standard scores being unity, we may therefore write 


a + a,+a=1; 


2 
e — Of: cae = 
aj? 
+ V, 
prre Taa Vu 
e TE O 
oe V, Vo 
Whence we may write the proportionate contributions to total variance as 
Communality ER Rr 
Reliability Gy +A = Ta; 
Specificity Eos Y 
Error : a = l — Taa. 


Here a word of caution may not be amiss. It might seem more consistent with the pattern 
of the correlation matrix to denote by Taas Toa, etc. the entries in the left-right-downwards 
diagonal. We correctly label them in the above symbolism as 7;,, rj, etc. In the jargon of 
factor analysis we speak of a; = rg as a factor loading or saturation, When there are several 
common factors contributing to a set of intercorrelations, we may number them as below : 


Za = AZ F A32 + A323 > ©. AsWsq Faea; 
s, = 6,2, + Beda + Dela . . . b AD 
It follows from results obtained in 12.01 that 
Tay = A; Tag = Mo, ete. 
The total communalities w.r.t. the two tests are then 
EHEH... and EHH... 
The p-m intercorrelation for tests A and B is 


A Ae A 
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18.07 THe SINGLE FACTOR PATTERN 


We have seen that a truly hierarchical pattern embodies the set of identities called vanishing 
tetrad differences. Before we explore a way of determining the factor loadings for such a pattern, 
it will be useful to recognise another set of identities involving triads, e.g. Tap, Tac aNd fae For 
the single factor pattern in the symbols of 18.05, | 


irae monger” ieee 
Tao? Vac = (es . Tou) Us . a ER Yaul bul cu ER Tau > Toe 


Yad - Tac 2 . 
a Fa e ° e e . . (1) 
Foc 
Similarly we derive 
Tab - Vad 2 Tac -Yaa 2 i .. 
es as s = 7 e e . » . (ii) 
au? au 
Toa Vea 


From any four intercorrelations involving test A we can thus derive 3 estimates of the 
communality fau = a, and similarly 3 of bu, Cy, duù etc. It follows that the triad ratios defined 
by (i) and (ii) cannot significantly exceed unity and should not significantly differ inter se if 
the product in the numerator has the same common factor for all. In practice, however, we 
shall not expect perfect agreement. For the example of a good hierarchical pattern in 18.04 
(p. 785) the triads involving the A test communality are ABC, ABD, ABE, ACD, ACE, ADE. 
These yield the following numerical values of 77,,: 0-75, 0-80, 0:83, 0-75, 0-81, 0:77. This 
raises the question : how can we choose a satisfactory average ? We may write the sum of the 
numerator products of the foregoing set of triads in the 5-test table as 


Sais Sa Fab . Yac + Vad e Tad + Yab ° Tae = a Tac . Tad + Tac Ya + Tad . Tae 
From the derivation of (i) and (ii) we see that 
2 l 
Sara) eas eee ° Teu + Tou . Tau + Tou Yen + Tou . Yau + Tou ° Y eu + Yau - Ea 

Similarly, we may write the sum of terms corresponding to the denominators of (i) or 

(11) as 
Si; = Tre + Toa + Toe tea + lee + Nae 
= You - Tou a You - Tau -+ Tohu.» Teu =p Yeu - Yau + Veu Teu + Yau Teu 
Whence we have 
DR = 
i. = Ed . e o . e » . (iii) 


A 


In this expression Sa means: add up all the products of two different cell entries in either the 
A-row or the A-column of the correlation matrix. 'The denominator S;; means : add all the entries 
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on one side of the left-right-downwards diagonal, excluding those in both the A-row and the A-column. 


For the specimen set-up on p. 785 (18.04) we have : 


A B C D E Product Total Sa(s;) 


A eee Tab Tac Tad Tae Tablac 5 YavWY ad + Tasl’ae $ Vac’ ad» etc. | 


— 


B 
no aS e | 
i > | 
ee Toa A a | 
ee ty ees Oe | 
| 


Total Ss) 


The pooled value of 73, in (iii) for the numerical example of p. 785 would thus be 
Ya, = [(0-73)(0-61) + (0-73)(0-53) + (0-73)(0-44) + (0-61)(0-53) + 0(-61)(0-44) + (0-53)(0-44)] 
= (0:55 + 0-50 + 0-39 + 0:43 + 0-36 + 0-29), 
A E == 070 and 7 MN 
As an alternative to this procedure, there is a hit-and-miss method of determining the 
factor loadings (fau, etc.), and its rationale is instructive in connexion with analysis of data 
which do not conform to a single factor pattern. To understand it, we must take stock of 
relations which subsist between the grand total and column or row totals of the cell entries of 


the correlation matrix. A 4-test matrix for an exact single factor pattern will suffice to make 
this clear. 


A B C D 
| 
A on Tab = Yau -You Vac = Yau - Y cu Yad = Yau - Y du | 
B Tao = Yau - You Tiu Toe — You - You Tod S You -Y du 
C Tac = Yau Yeu Yoe = You You on Tod 5 Yeu -Yau | 
D ha m ee aT coh oe PO ge Ti. | Grand Total 
Total ES ty E ta 


e 


In the foregoing schema t, t», etc., are column totals and T is the grand total of the inter- 
correlations in both dimensions after completing the diagonal entries. We may denote the sum 
of all the factor loadings (a, = faw Ou = fow etc.) as 


Su = Yau + Tou q ‘ou F Yau. 
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By inspection we see that 
la = Tau - Sw 


Ta fa -F ty $ Lo E la = Vi > You F Vou + aaus 


a 
= 
Tay = oe = Me o à i : 0 


To find a,, ba, etc., we may thus proceed iteratively as follows : 


2 


(a) guess values of a2 = r2, bi = Ys, etc., by assigning the diagonal cell numbers which 


fall into step with the hierarchical pattern ; 
(b) sum the columns and add these to get the grand total ; 
(c) divide each total by the square root of the grand total. 


If our initial guesses are good, (c) should give values a,,, bu, etc., whose squares tally closely 
with the assigned diagonal entries, some being a little higher and others a little lower. Indeed, 
it may well happen that we can now reconstruct from the products (fap = @, » bu, etc.) of these 
first approximations to the factor loadings a matrix of which no cell entry differs significantly 
from the original one based on the observed data. Otherwise, we may adjust as follows. Ceteris 


paribus, an assigned a, if too high will make T too high and hence £, = a„V T also too high, as 
will be seen by reference back to the observed data after addition of the assigned aj, to complete 
the column total. Accordingly, we next try out a somewhat lower value of a,, and proceed 
(mutatis mutandis) to deal with the other columns in the same way. Usually, a second guess 
will be good enough. 

Another property of the single factor correlation matrix is a useful check on the computation 
at each stage. If we denote by T, the grand total of the cell entries in the matrix of observational 
data, and by D the sum of the diagonal terms, the following relation holds good for the completed 
matrix : 

Tef Deg 


Thus the sum of our assigned diagonal entries added to the sum of our observational entries 
should correspond closely to the square of the sum of the assigned factor loadings, if our guess 
is good. 


18.08 THE Bi-FacToR PATTERN 


As we shall later see, factor analysis is like analysis of variance in more ways than one, and 
especially in so far as there is: (a) no room for doubt about its limited usefulness in certain 
situations ; (b) ample justification for suspicion of some claims put forward on its behalf. ‘The 
recognition of a single-factor pattern is an example of what we can do without making arbitrary 
assumptions which we cannot justify by reference to the data ; and a second pattern, called the 
bi-factor, is another. 

It may happen that a set of intercorrelations which do not exhibit close conformity to the 
hierarchical principle is classifiable if we group the tests in such a way that members of the 
same group are highly correlated inter se but weakly with members of other groups. If there 
are more than 2 such groups and at least 4 members in each, we are now in a position to proceed 
with confidence to explore the consequences of the hypothesis that 
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yg 
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0 0 0 0 0 0 0 
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rare: + tF Esto + TG Ta Sp + IS Tp [gts 15 TS Tp 
(Ef + 2) ef89 + LY ftp + ftp id he fu tile 
ef%a + uta (Za 4- 7a) tatp + talp t212 tatg TgIlyn 
Tot TP] e+ ep |: p+ tp) pto pr po 
be ts T212 Tyla (89 + ta) 69%q + ot 69% + TaTpy 
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(a) all the tests are correlated in virtue of a single factor common to all ; 


(b) members of the same group are more highly correlated in virtue of a factor c common 
to and peculiar to the group. 


The reader should now find it easy to translate these postulates in terms of the Umpire 
Bonus Model. Though a satisfactory vindication of the bi-factor pattern by the procedure we 
shall now examine calls for at least 12 tests, it will suffice for illustrative purposes, if we 
formulate a schema for standard scores of 10 tests involving 3 groups after correction w.r.t. 
reliability as follows : 


BO. Hy Gas Sy Gs Zp 
ED AT 2, + 6, 85 
AE TN O 
2¿=0d,-23,+4d,.2% +d,.2,3 
E Oy ll PA TT 
Bp =h. E Ji- Sez 
Bq = 2 + By 7 Ba» Bp + 2+ Seg 
Ry ye et A. By ER, A 
Zi =J1 Bt fy. Za t Js -Zes 
Zk = ki. Zi + ka. 24 + Rs- Zor 


For such a schema the correlation matrix is as in Table 4. 

Provided that there are at least three groups as in Table 4, in which they are (ABC), 
(DEFG), (HJK), we can form at least one triad involving only intercorrelations of members 
of different groups, e.g. : 


Tad» Tak as 
A Sey eet ees = ay; 
Tak de Ri f 


If each group contains at least 2 members we can test the result, since we can then extract at 
least one other equivalent triad, e.g. : 


> . 
Tej ; ĉi -J1 


Having provisionally vindicated the possibility of extracting values of a, b,, etc., with good 
agreement between the several estimates of each, we are now in a position to make a matrix 
exhibiting what the approximate values of the intercorrelations would be, if there were no group 
factors, i.e. if they arose entirely from the common factor. The first, fourth and last row would 
then read in conformity with the product rule for the single factor pattern : 


a; a,b, alı adı aye af, 4181 ah, Ay], akı 
a,d, b,d, cid, di dye, dif, dig, d,h, diy d,k, 
ak bik; CR, dik, ek; fıkı Ek; h,k, Jika ki 


On subtracting cell entries of this matrix from the corresponding one ot Table 1 we now have 
a matrix of residuals shown in Table 5. 
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Thus the entries for the between-group correlations in our matrix of residuals = us an 
additional check on our procedure, viz. none of these residuals should differ significantly from 
zero. Ifthe result confirms our supposition, we may proceed to assign the values of the group 
factors 4», €s, ky, etc., by the method of triad summation implicit in (iii) of 18.07. With appro- 
priate modification the same procedure is adaptable to the extraction of pooled values of the 
common factor, i.e. to extract a principal factor loading a, in group I we form: (a) the numerator 
by summing all products in the A row between cell entries of the original matrix after excluding 
those which belong to the same group as A itself; (b) the denominator by adding from one 
side of the diagonal involving all intercorrelations two members of different groups other than 
the A-group. 

An exacting vindication of the bi-factor set-up will demand that the group residuals are 
consistent with a single factor patter, i.e. within a group tetrad differences vanish and the triad 
ratios are consistent. ‘The extraction of the residual matrix by the procedure outlined above 
itself presupposes that there are at least three group factors, and we have no check on the first 
step unless each group contains at least 2 members. To validate the last stage we require at 
least 4 members in each group, since three would provide only a single estimate of any one factor 
loading. 


18.09 HIGHER FACTOR PATTERNS 


In the foregoing treatment of the bi-factor pattern we postulate a single common factor and 
3 group factors. We then have three groups of tests each with two common factors. We shall 
now ask : is there any procedure by which we could validate our assumptions, if the data suggest 
the existence of only 2 group factors ? If so, we have 2 groups of tests, with 2 and only 2 factors 
common to members of the same group. This feature of the bi-factor set-up prompts us to ask : 
is there any criterion analogous to the tetrad principle to define a set of intercorrelations involving 
the same two sources of variation and no other? 

An answer to this question is obtainable by recourse to elementary algebra as for the 
derivation of the tetrad equations in 18.05 ; but the procedure is laborious. The advantage of 
side-stepping the labour entailed by use of grid algebra in connexion with this problem and 
a fortiori in connexion with the exploration of more complex factor patterns explains the pro- 
minent role which matrix algebra plays in expositions of factor analysis, and hence also the 
prevalent jargon of factor-space and other geometrical mataphors suggested by matrix operations ; 
but the truth is that the logical assumptions underlying factor analysis are explicable without 
reliance upon it and to all except the mathematically proficient easier to grasp in the language 
of the more elementary mathematics employed in foregoing sections of this chapter. Factor 
analysis presupposes a system of score components expressible as simultaneous linear equations. 
of the type we have met with in connexion with the Umpire Bonus Model. If there are only 
2 or 3 relevant variables, recourse to determinants has little value as a labour saving device. 
There has therefore been no good reason for making the assumptions implicit in factor analysis 
stepping the labour entailed by use of grid algebra in connexion with this problem and 
less accessible to readers not at home with determinants and/or matrices by invoking their aid 
in what has gone before. At this stage, we do so merely because the algebraic treatment of the 
problem raised above is otherwise very laborious. There is no indispensable tie-up between 
the logic of factor analysis and the matrix algebra which most expositions of the procedure invoke 
at an introductory stage as a prerequisite to understanding it. 

Before proceeding further let us recall the structure of a tetrad equation, e.g. 
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In this expression ra, and 7,; are cell entries of the same (4-) row, fay and fpg are cell entries 
of the same (K-) column, 7,, and fpg occur in the same (B-) column, while 7,, and fpg occur 
in the same (Q-) row. Thus the 4 relevant entities occur as cell entries at the apices of a rect- 
angular segment of the matrix and as such constitute a minor of the first order, viz. : 


Yad Yak 
= Fab e ‘ka — Yak - Toa 


Yoq Tka 


We might therefore express the tetrad rule derived in 18.04 by saying that one common 
factor and only one suffices to specify a matrix of intercorrelations if every determinant minor 
of the first order vanishes. 

This suggests a more general rule due to Thurstone. For our purpose it will suffice to 
demonstrate it for a group of tests each pair of which involves 2 common factors and two only. 
The rule then states that all 2nd order minors of the correlation matrix must vanish. Below 
we set out such a minor : (a) in terms of the cell entries ; (b) in terms of the loading factors 


prescribed in 18.06 and 18.07. 


| ‘be Tok Yom bye, + dye, bıkı + bzko bım, + bəma 

| 

| 

o Tok Yam = Ag =| £101 + Zez Eiki + Soko Zim, + gM 
Tje Yik Yim J101 + Jota Jıkı + Joke jim, + jam) 


We may reduce the determinant on the right as follows : 


e R 
bitb bitb by + by 
ey kı my, 
A e. k m 
— = = gi + g— 81 + 2 gı + g— 
e¡k,¡m,» €; B: Mi 


5 . €2 . . fv2 . . Ma 
Jı +1 J1 + Ja Jı Fi 


Il 


; IN ¿Ri \ 
BRB) |a a atot 


ez Mi Ri Mi 


J2 J2 Jı + jo— 
Mi 


The determinant in the last expression has two identical rows, whence its numerical value is 
zero and 

ae oe 
This completes the proof that the 2nd order minors vanish, if all the intercorrelations are referable 


to the same 2 common factors. The converse assertion is demonstrable. As an exercise, the 
student may paint in the values of 7,,, etc., for a 3-factor set-up, e.g. 


Toe = bye, + Does + dyes. 
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We then find that Az vanishes only if one of the factors is algebraically redundant. In short, 
the full statement of Thurstone’s rule illustrates one of the uses of matrix algebra in the theory 
of equations, i.e. to prescribe the redundancy or otherwise of one or more variables. Geometrical 
terms employed in the theory of equations are suggestive to the mathematician who is already 
at home with them ; but this is not essential to an appreciation of what a particular factor pattern 
postulates. We could, of course, express ‘Thurstone’s rule for the 2-factor pattern in the form 
it takes when we expand the 2nd order determinant, viz. : 


Ta Tkt ia gas Y gm 5%) es Pal gl jm — Vom"; S sf Fol aT — Tgkf; s — 0. 


The student who wishes to check this by lower school certificate algebra may doso. The attempt 
will at least dispose of any lingering doubts concerning the advantages of a little familiarity 
with the use of determinants. It will also be a wholesome demonstration of the irrelevance of 
portentous excursions into matrix algebra as a preliminary to understanding the logical assump- 
tions which factor analysis prescribes. 

The rule we have last examined raises the question, how many tests in a group suffice to 
identify a 2-factor pattern with due regard to the fact that our observational data do not furnish 
cell entries for the diagonal of the correlation matrix? For a single factor pattern we need 
4 tests to provide a complete first order minor. ‘This is evident, if we set out as below a 3-test 
matrix on the left and a 4-test matrix on the right : 


| = foo Ya | E Tos s PA 

Tab a Y be | Tab =s Toe Toa 
Loa ten ee re he A 
| Yaa ‘oa Tea 4 


Evidently, we cannot get a 2nd order minor from the 4-test matrix. Nor can we do so with 
5 tests. We need at least 6, as below. 


SE Tad ‘ac Tad Tae Taf 
Tab qe Toe Tod Toe Pos 
Tac Foc a Toa Fco Pos 
Tad Tod od -> Tae Taf 
Fae Toe Tee Tae ES Tef 
Taf Yor Tof Taf Tef za 


Theoretically, then, we can check up on the requirements of a bi-factor pattern involving 
one common and 2 group factors if each of the two groups contains at least 6 members or twelve 
in all. By applying Thurstone’s rule to each group and to the entire matrix we can establish 
the conclusions : (a) 2 factors suffice to explain correlations within the group since second 
order minors vanish ; (b) one factor suffices to explain correlations between groups since inter- 
group tetrads vanish, e.g. in the schema of 18.04, Taa- Tef = Taf - Toa 

It remains to ask: can we assign to the common factor values from which we 
can reconstruct by subtraction (as in 18.08) a matrix of residuals having values consonant with 
the requirements that: (a) inter-group residuals are zero; (b) intra-group residuals are hier- 
archical ? To clarify the situation the student may set out in full the schema of inter-group 
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correlations for the single common factor components of the matrix for a battery of 12 tests, 
A-F being one group and G-M the other ; but it will suffice for illustrative purposes if we take 
3 of one group and 3 of the other as below: 


G H F Total 


Factor gı hı ay Sy 


Evidently the hierarchical principle obtains for all inter-group correlations, but the reader 
who is familiar with the theory of equations will find that such a schema yields too few inde- 
pendent equations to admit of a unique solution for the factor loadings. Since one exception 
suffices to dismiss the possibility, it is instructive to exhibit two complete solutions as below : 


Total 

Factor (0-3) (0-4) (0-5) (-12) 

(0-2) 0-06 0-08 0-10 0:24 

(0-5) 0-15 0-20 0-25 0-60 

(0-1) 0-03 0-04 0-05 0-12 

Total or 8) 0-24 0-32 0:40 0-96 


Factor (0-24) (0-32) (0-40) 
(0-250) 0-06 0-08 0-10 
(0-625) 0-15 0-20 0-25 
(0-125) 0-03 eee 
Total | (1-0) 0-24 0-32 0-40 0-96 


The restrictive relations in the evaluation are all included in the system of equations exhibited 
in the row and column totals of the schema 


= —! h, ==; ==; 

£1 ad fi "i 

pea La Ís to 
a=, b ==; 1 7 > 

Sg Sq Sg 
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By starting with an arbitrary value of s,, the reader may easily check that any set of values con- 
sistent with the above works. For example, są = 3 implies that s, = 0-32 to give T = 0-96. 
We then have 


0-24 0-24 
a BA gp 


Now any set of common factor loadings such as the above would lead to the same residual matrix 
from which to determine our group factor loadings. The situation is therefore this: we can 
recognise what the factor pattern is; but we cannot assign unique values to the factor loadings. 

This lack of uniqueness is the pivot of a controversy which involves two issues. On the 
analysis, if the factor loadings they prescribe are arbitrary. On the other hand, it may be in- 
structive to recognise the existence of a factor pattern in the absence of a numerical specification 
of the loadings; and we have seen that this may be possible to accomplish when no unique 
numerical solution is realisable. If so, it is important.to be clear about what we mean by recog- 
nising the pattern. 

Against the background of the Umpire Bonus Model, the reader who cares to pursue the 
topic will find the following hints helpful. Of two essentially different procedures subsumed by 
the term multiple factor analysis, that of Thurstone admits the possibility that: (a) a score 
referable to a particular test may contain a specific factor component ; (b) the number of non- 
specific factors may be as great as the number of different pairs of tests in the battery. In the 
idiom of the Umpire Bonus Model this means that each player’s total score contains some multiple 
—not excluding zero as a possible multiplier—of his individual score together with some multiple 
—not excluding zero as a possible multiplier—of each of different umpire contributions equal in 
number to the number of pairs of players. The initial formulation definitive of score value 
components thus leads to a system of fewer equations than variables. The solution sought by 
an iterative procedure is the most economical in the sense that it seeks to interpret consistently 
any inter-test correlation in terms of the least number of score components. 

The assumptions implicit in Hotelling’s procedure are on all fours with the rules of the 
model 3-wheel game prescribed at the end of 12.07. There the player is passive, each player’s 
score being made up of contributions from the same number of umpires not exceeding the number 
of different pairs of players in the initial formulation of the Hotelling set-up. The number of 
basic equations definitive of score components cannot therefore exceed the number of variables, a 
circumstance which confers on the game an aspect of greater algebraic propriety than that of 
Thurstone. Again, however, we must invoke a quite arbitrary axiom of economy in the search 
for a saitsfactory selection of one among many consistent sets of factor loadings ; and the pre- 
scribed procedure will lead, as pointed out by Godfrey Thomson, to different factor loadings if 
we add a new test—and hence the possibility of n new factors to an n-fold test battery. 

Either method involves an issue which is outside the scope of mathematics as such. ‘Thur- 
stone himself faces it frankly when he appeals to William of Occam’s principle: entia non 
multiplicandur praeter necessitatem. According to his view the most economical factor loadings 
are the best because economy of hypothesis is a canon of scientific method. ‘This plea is open to 
criticism at more than one level. The use of the term economy in scientific enquiry is not wholly 
unequivocal ; and what G. P. Meredith calls the epistemic status of the canon is itself debatable. 
As the writer sees it, a methodical scientific worker will rightly choose to investigate first the 
simpler of two hypotheses to forestall unnecessary waste of effort, if it stands the test of experience 
equally well. So interpreted the Occam principle embodies a wise code of procedure in the 
process of discovery ; but embodies no rational prescription for deciding whether one or the other 
hypothesis is true or false without appeal to the higher tribunal of the experimentum crucis. 


075; Fay = 008075) = 0-06, etc. 


E 


ELEMENTS OF ANALYSIS OF COVARIANCE AND OF FACTOR ANALYSIS 803 


All such procedures referred to as factor analysis presume a strictly additive relation between 
the score components and zero covariance between any two of them. In the absence of con- 
firmatory evidence, this assumption, which is also inherent in the construction of the balance 
sheet of the analysis of variance, is highly arbitrary and sometimes grossly inappropriate. In 
Chapter 13, we have seen how the replication criterion may give us the opportunity of confirming 
the additive postulate when the end in view is a balance sheet of variance, and we have seen 
reason to believe that the same principle is both a necessary and sufficient condition of the 
validity of Thurstone’s rule and the tetrad criterion as a special case of it. Just as it is all too 
common practice to overlook the importance of the replication criterion, it is all too common 
to execute elaborate computations to extract factor loadings when there is : (a) no other clear-cut 
factor pattern to validate the initial assumption of the additivity of the score components and 
the twin postulate of zero covariance ; (b) no possibility of arriving at a solution preferable to 
others equally consistent with the data. 


s3 


CHAPTER: 19 


SAMPLING IN A FINITE UNIVERSE AND 
MANIFOLD CLASSIFICATI 


19.01 THE HYPERGEOMETRIC DISTRIBUTION 


WE have already (Chapter 2, Vol. I) obtained the exact expression for sampling without replace- 
ment in the domain of binary taxonomic scoring, viz. a distribution defined by successive terms 
of the binomial in factorial powers : 


(f+) _ (ng FAO 


n” n” 


If the sampling fraction (F) defined by the ratio nF = 7 is small and n is very great, we have 
seen (Chapter 3, Vol. I) that we may regard the distribution as approximately normal; but 
we have not as yet explored the possibility of finding a satisfactory fitting curve when n is great 
if F itself is not a small fraction. We shall now do so by recourse to the method of moments. 
For reasons which we have seen in 14.04, it will be convenient to derive first a general expression 
for the factorial moments, defined as 


I <= k r! x r—x > 
Hik) = añ 2 am x I(r — x)! sí T ) i . . . (1) 
In this expression 
r!=—r\(r— kl; se =s Us — JE; 
x) 1 
al: da RT 
xB) çE) t=y (r =e k) | a, 3 
hen = Sey 2G Ble T A 
If we write (r — k) = a and (x — k) = b: 
o=r (r -- k)! ae — a! eee 
AS ao x seed MRS ES pe pe Ll Ao 
o a o 2 o yd 
When b< 0, the reciprocal of b! is zero, whence 
>= En PE a! (b)f(4-b) 
Doa ate: A y" ed 
=(+f-H® 
= (n — hy”, 


Whence by substitution in (11) 


aS (iii) 


Pik 
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In this expression 
Fae ett r—k 
no — n(n — y, 
(16) (k) E 
os Uik) == ge e . e e . . . (1v) 


From (iv) we at once obtain the zero moments by the substitutions 
Po = liz) F Mays 
pg = His) + Spa) E Baas 
Pa = luar F Gs + 72) + eoa 
Whence we derive 


A : . ; : š I ; Se 


rs rs rs(n+rs—r-—s) 


E A = vi 
Pa n n(n — 1) (vi) 
DD Zr D ys 5 
pg = pery ee; ae a . . . . . . (vii) 
pO ee a rs Et 
Mm n(3) (2) n ou 
We can now obtain the mean moments in the usual way : 
pa eee 
0 oe 
rs 
= ——, (nrs— mr — ns + n + n? — n — nrs + rs) 
nin — 1) 
rs 
= ——— „ (n—r)\(n—s 
T PAN 
(n — ryrsf 
SS A 
If we write s = np, f=nq and r=nF 
NAF = F)rpg 
ma == Ta 
= 
n 
Whence for large values of n 
ma, =(1— Pirpq . í : gre aR) 


It will simplify the task of evaluating the third and fourth mean moments if we express 
them in terms of factorial moments as below : 


Ms = H3 — Spf + 2p3, 
Ma = gs — Smalto — 1) + olor — 1) (Hay — 1). à A : , (x) 
mM, = pa — Aug + Suoni — Spi» 


"My = ay — 2 (2) — 3) + tun (Buty) — Ue) +7) + Mio (Sey — Dor 1) (xi) 
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Whence we have 


hgh dal ll A 
i= IO 
— — — 2 
m= pla — DN AO (xii) 


(—D(m—2) 
On the assumption that the universe is large, so that (n — 2) œ n, we therefore have 
mz œ= rpq(¢q — p) (1 — F)(1 — 2F). E (ei) 
In the same way we obtain 
m, = rpq(n—r)[n(n +1) —6r(n —r) +3pg(n(r —2) — nr? +6r(n—1)}]+(n—1)® (xiv) 
-~ m=rpq(1 — FI — 6F(1 — F) (1 — 3pg) + 3pg(r — 2 — nF?) . : E) 
All the expressions for the moments cited above reduce to those of the replacement distribution, 
when F = 0 as must be true of a finite 7-fold sample from an infinite (n = ©) universe. The 
appearance of the factor (1 — 2F) in (xiii) shows that the third moment vanishes when the 
sampling fraction is 4. In fact, all odd moments then vanish, as we can see from the following 
considerations. Let us consider the frequencies (y, and ya) referable to score values x, and x, 
equidistant from the mean on either side of it, so that we put x, = (M — a) and x, = ( M+ a). 
If F= 4, we have (n — s) = f = (2r — s), and a 
i sed(Qr — s) 20 xo! (r — xa)! 
Ye tr —x,)! Sr — so 
o sr = sya (M Hal 
(M —a!(r = MS al" +02 ¡OS 
On substituting in accordance with the formula c® . (c — x)! = c!, this reduces to 
yı  (M+a)!(s—M—a)\(r—s+M+a)!(r—M—<a!)! 
Y (M—a)\(s—M+a)!(r—s+M-—a)!(r—M-+a)! 
Since M = rp and s = np, the condition F = 4 implies that M = ts, whence 


Yı ı (4s + a)! (4s — a)! (r — Es + a)! (r — 4s — a)! 


Y ‘(4s — a)! (4s + a)! (r — $s — a)! (r — łs + a)! 
E 
Fa 
Thus the frequencies of any two score values equidistant from the mean are identical when 
F = 1 and the r-fold sample distribution is symmetrical whether p is equal to g, greater than 


q or less than q. 
Thus f, vanishes if F = 4, just as it also vanishes if p = q, i.e. s = n. Subject to the 


same condition, (ix) and (xv) reduce to 
m, = 31D ; 
mina A 
m, = pg 1 0 ; pa) ¿e > =i 
a 


oie ° . é i EAS) 


ee By=3 
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This has the same form as the second Pearson coefficient defined by (xx) in 15.04 for the symmet- 
rical B(j, k) variate of restricted range (Type II). This we have seen to be 
6 
2 BU as 
This is equivalent to the above when 
eo — 1) ~ 369 +1) 
6pq + 2 


To satisfy the requirements of Type IT, it is also necessary to show that 


2 and m= ES 
y -AGAD 
2 
pd 
F. 2m, p 


In our expression y, = rp and 2m, = rpg when F = 3, so that the above also implies the 
relation 


ee | 
Ds 


sp 
- Erie 
q 
When p == q, the distribution is again symmetrical regardless of the size of the sampling 
fraction, so that 


6F(1 — F) +2 
r(1 — F) 
This again is consistent with one of the requirements of the Type II distribution if we write 
ae eo or) 3 
— a 


More generally, when neither p nor F is equal to 3, the form of the f£ coefficient may conform 
with Type I requirements. ‘The determination of the constants is laborious, and the student 
who wishes to pursue the topic may consult Pearson’s tables of the Incomplete Beta Function. 


B,=0 and Bp, =3— 


1902 MOMENTS OF A SCORE-SUM DISTRIBUTION 


The results derived in 19.01 are obtainable from more general expressions for the moments 
of a score-sum distribution in the domain of representative scoring without replacement. To 
establish them, a digression is necessary. It is the peculiarity of binary taxonomic scoring that 
the algebraic form of the unit sample distribution is inherent in the statement of the problem. 
Representative scoring—except in the limiting case of a 2-class system—raises a new problem. 
In the foregoing section, we have determined the moments of the non-replacement distribution 
of samples from a binary universe to obtain an approximate expression for the distribution of 
the r-fold sample. In what follows we may remind ourselves that a continuous distribution can 
give a satisfactory description of a universe only if the number of score classes is very large, and 
hence that we may usually disregard the consequences of non-replacement. We therefore 
assume that the unit sample distribution is discrete. In practice, we may likewise assume the 
unlikelihood that our manifold universe of scores closely conforms to any known discrete sampling 
distribution such as the rectangular or the distribution defined by successive terms of a binomial 
or of a Vandemonde expansion. It will suffice to postulate that we have empirical sources of 
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information from which to determine the moments of the unit sample distribution, and on that 
understanding we shall derive an expression for the moments of the r-fold distribution without 
recourse to other data. 

Subject to the replacement condition or to the postulate that the size of the universe and/or 
sampling fraction permits us to disregard it, we have obtained expressions of this sort in 14.05 
by the method of iteration. We shall now show how it is possible to derive them by a different 
procedure which is adaptable to situations in which : (a) there is no replacement ; (b) extraction 
of the sample materially changes the composition of the parent universe. 

We customarily define the representative score of a sample by the mean of the constituent 
individual scores, i.e. by the quotient of their sum and the sample size. The latter being constant 
for a sample of given size is immaterial to a specification of the distribution, since all the para- 
meters of the distribution of the mean are obtainable from those of the sum (S,) of the r-fold 
sample by a scalar change involving r alone. We can regard the r-fold sample as r successively 
extracted unit samples of score x,, so that 


Dp =X + Ke ey. EEN 


By definition the kth zero moment of the score-sum is the expected value of S*; and we may 
write this as 


PS = Bg + Xa + Xg... AA 
An examination of the moments of the 3-fold sample distribution brings into focus what we 
need to know in order to evaluate an expression such as the above. Thus we have 
pol Sa) = E(x + xa + %3)? = E(x?) + E(x) + E(x2) + QE (xy . x) +2E(x, . x3) + 2E(x . x3). 


Now the subscripts we attach to x in this expression refer to the order in which we extract the 
unit samples, regardless of the numerical value any individual unit score x, or x, may have. 
This expression therefore contains terms of two sorts: (a) squares of unit scores ; (b) products 
of unit scores whose numerical values may be the same or different. If we replace each item 
to which we attach a score before drawing another it is evident that: (i) the numerical value 
of the square unit score does not depend on the value of the subscript u or v ; (ii) the numerical 
value of the product of different unit scores is likewise independent of the particular values 
assigned to u and to v. Whence we may write the last expression in the form 


[o(S3) = 3E(x%) + 6E(x,, . Xy). 
Similarly we may write 
pa (Ss) = E(xi) + E(x3) + E(w) + 3E (xj . xə) 
+ SE( 02 . 5) + 3E(x3 . 0) + SE(x2 . x) 
+ 3E(x3 . x,) + 3E(x3 . xo) + GE(x, . Xa . xa), 
ws a Sg) = 3E(x%) + 18E(x% . x,) + 6ÉE(%, . xy. Xu). 

In these expressions E(x?) and E(x;) are resepectivly the expected values of the square and the 
cube of the unit sample score, i.e. of the second and third moments of the unit sample distri- 


bution ; and we may write these as uz and us respectively. We may speak of a k-fold co-moment 
as the expected value of the product of the unit scores of a k-fold sub-sample, and write 


Eley. x =p 2 ee Ele, x) 


EG. By RG) E 0 
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In this symbolism, and subject to the replacement condition unless we can subsequently show 
its irrelevance, 


pa(S3) = Sa + Spy .a5 
Hs( S3) = Sus + 18.1 + 6p1.1.1> 
When replacement does occur we may write 
E(x} . x) = E(x) E(x) ; 
E(x . x- iy) = Elo) E(x) E (sio), 
“o Bj. = Pj Pr ANd By. g = Pin» Pa + Mee 
Hence we may write 
pa(S3) = Spa + Gui; 
(Ss) = 3u + 18ua wy + Bui. 


In the same symbolism, as the reader may check by expanding (x, + «2 + xs)", 


pa(Ss) = Spa + Up . pa + 18u2 + 36u - pi- 


These considerations suggest the possibility of obtaining general expressions for the zero moments 
of the r-fold score-sum of a replacement distribution if we can enumerate terms involving the 
same set of exponents in the expansion of the expression (x, + % + %3...2%m)". It will clarify 
the issue if we re-examine the derivation of the multinomial theorem by recourse to the 
chessboard device. 


1903 CHESSBOARD DERIVATION OF THE MULTINOMIAL THEOREM 


The chessboard device is at once a replica of the algorithm of multiplication and a means of 
exhibiting all possible permutations consistent with repetition. Our use of it to derive the 
binomial sample distribution in Chapter 1 of Vol. I is a particular case of its successive application 
to exhibit the build up of the multinomial, as below 


aa ab ac ba bb bc ca cb cc 
a | aaa aab aac aba abb abc aca acb acc 
o NESARA AF AEA ERREN, A E SIA PTN PE E, 
| 
b baa bab bac bba bbb bbc bca bcb bee 
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If we take out all terms in the above with identical factors regardless of order we may classify 
them as follows 
aaa = d: bbb Sa Ss ae SE 

aab +- aba + baa = 3a*b 

aac + aca + caa = 3a*c 

bba + bab + abb = 3b2a 

bbc + bcb + cbb = 3b?c 

cca + cac + acc = 3c%a 

ccb + cbc + bec = 3c?b 

abc + acb + bac + bca + cab + cba = €abc. 


When, as is customary, we collect our terms in this way, the appropriate numerical coefficient 
of each one is implicit in the law of generation inherent in successive application of the chess- 
board lay-out. We may state it thus. ‘The exponent of each basic term (a, b, c, etc.) is the number 
of times the latter appears as a factor in the product. In the expansion of (a +b +c...) 
each product will have n factors in this sense; and the number of identical factors may be 
I, 2, 3, . . . m if mis the number of basic terms. If u, v, w, etc., signify the exponents of a, b, c, 
etc., they therefore respectively represent how many times a, b, c appear as factors in the product, 
and their sum is n. The numerical coefficient of each product with the same build-up regardless 
of order is simply the number of linear permutations consistent with its build-up ; and the 
number of different linear arrangements consistent with the build-up of a term of the form 
a*b*c” is the number of ways in which we can set out in a row n cards classifiable as three classes 
respectively composed of u, v, and w members. This if given by the familiar formula 


n! 
ay A 


(1) 
Thus we speak of "P,,.,. so defined as the coefficient of the general term of the multinomial 
of nth degree. As we have already seen in Chapter 1 of Vol. I, the sum of all the numerical 
coefficients so defined is m”, as is evident from the identity 


ul v! w! 


C= tan = n! 


m=(1+1+4+1...mtimes!*"= > > (11) 
u=0 0.0 
If a= b =c, etc., in the multinomial (a + b +c...)”, it will be evidently convenient 
to carry our classification a step further by collecting products of the same order, i.e. with identical 
exponents regardless of the component (and then numerically equivalent) basic terms. For 
instance we might write out the expansion of (a + b +c) in accordance with the following 
schema : 


““ulvlal... 


aè + 63 + c — 3h; 
3a?b + 3a?c + 3b?%a + 3b?c + 3c?a + 3c?b = 18h? ; 


Gabe = 6hij. 
Similarly, we should write 


(a + b + c)? = 3h? + 6h; 
(a + b + c + d} = 4h* + 48h* + 36h? + 144h?ij + 24hijk. 


Necessarily, the rule exhibited in (ii) holds good since we have merely collected terms with the 
common property that they contain the same assemblage of exponents. Thus the sum of the 
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coefficients of (a + b + c + d)* = 256 = 4!, and that of (a + b + c)? = 9 = 3?. The general 
pattern for an expansion in terms of the same order defined as above is then 


(a+b+c...m terms)" = K,h” + Kony sh" 14 + Kina). qh" 0? + Kina) .1.18 "Y, eto. 


We may write the general term of this expansion in the form 
EE ORE e iy SN : MU) 


In 19.02 we have already seen how occasion may arise when we can take advantage in 
statistical theory by making use of the number of products of the same order, i.e. the numerical 
value of K,.,., in (iii). When we condense our classification of terms in this way, we collect 
together a set of x products each of which carries the numerical coefficient “PP. o... . 480 
that G, = x."P,,.5... . . . Our problem is now to define x in numerical terms. 

To do so we note that any such product as h?.2%.j” . . . contains e different basic terms 
and e not necessarily different letters as exponents attached thereto. For instance the product 
of order (2, 2) in the expansion of (a + b + c)* contains 2 different basic terms (a,b; a,c; b,c) 
and two identical exponents (2, 2) which specify its order. Since we group all terms with the 
same e different basic terms regardless of linear arrangement and with the same e-fold set of 
exponents, the factor x defined as above has two components : 


(a) the number of selections of e different letters from a collection of m, i.e. "C, = Mia ; 


(b) the number of ways in which we can assign the e letters p, q, 7... each to one of e 
different basic terms. 


We have not explicitly examined (b) in our previous treatment of choice. In this context we may 
visualise the issue if we regard each basic term in the product as one of e boxes labelled A, 2, j, 
etc., for purposes of identification. To these boxes we have to allocate the e letters p, q, 7, etc., 
in every possible way. If p, q, r are all different, i.e. if each letter occurs only once, as in a product 
of the form h1 i? jk? in an expansion of order n = 10, there are el different ways of doing this, 
since we may assign e different letters to the first box, and having done so (e — 1) to the second, 
etc. If two of our letters are the same, e.g. p .p. r, there will be only one allocation ppr corre- 
sponding to pqr and gpr, one allocation prp corresponding to prq and qrp and one allocation rpp 
corresponding to rpg and rqp. We may generalise the consequences of this consideration as 
follows. Of our e letters, we suppose that one occurs s, times, one occurs s, times and so forth. 
Corresponding to each letter which occurs s, times there would be s,! different allocations, 
if we substituted s, different letters. Hence the number of different allocations of the e-fold 
set is 
e! 
eS ON 

If p, q, r are all different sı = 1 = sz = 53, etc., and the above reduces to e!. To each of the 


M,e ways of selecting our basic terms the above expression defines in how many ways we can 
allocate the same set of experiments. Hence we derive 


el 
be = Mi ) . 
ga ee ie er 


mo e! 

E HA 5. 
me 

EE 
A e 
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Whence we derive 


me n! 
| = aa PEPA 
AE AS A a ae, 
In this symbolism 
mia 6! 


Ki.2.3=X3.1.8=X3.1.9 etc. = Vii METETE TE 


nm D! 
Dessa inaa = Bi, 2.2 ET . 9191 qr 
m3) 11! 
Ks 4.4 = 4.8.4 — 43.4.4 OM * 41314" 
ma 6! 
Es. = "or * Brgy 


(v) 


To clarify the meaning of (v) let us consider the expansion of (a + b +c)*. We may expand 


this in terms of the same order as follows : 


(a -+ b + cy eG K,h* + K;.¡h% + Kahi? + Ko. 1.14% ER Ky .1.1.1/3R. 


In this expansion m = 3 and n = 4, the values of e corresponding to the numerical coefficients 
being successively 1, 2, 2, 3 and 4. In two products (hz? and h?j) the same exponent (2 and 1 


respectively) occurs twice. Otherwise no exponent occurs more than once. Thus we have 


3) 4! 

es ee 

Kia 5 > sare =R 
ea 0 
Total . ; 81 


The total 81 checks, being 3* = m”. More fully, we have 
a’ b+ ct = 3h, 
4a?b + 4a?c + Ab%a + 4b8c + 4c8a + 4c3b = 24h, 
6a?b? + 6a?c? + 6b2c? = 18h2?, 
12a*bc + 12b%ac4- 12c?ab = 36h777. 


From the foregoing, we may write down the general expressions for the expansion of orders 


u = 2, 3, 4 as follows: 
(a +b+c.. T = mp2 + mhi 
(a Fote, . J7 = m0n + Imaan a 


(atb+te.. Y = mV h + 4mMh 3st + 3mUh?? + 6mh?) + mYhijk 


(vi) 
(vii) 


(viii) 
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In these expressions 


Ks.0=X3.0.0=XK1.0.0.0= M” : . . : . (ix) 
Koea RR OO Bea . . (x) 
Ky. = 3m® = Kyo; K2.1.1 = Mh . . . . eo py 
K; 1 =4Am" ; ; j : : ee od 


We may now derive expressions for the zero moments of the generalised distribution of 
the score-sum by the method outlined in 19.02, where we obtained the results 


pal Ss) = Spa + Bui, 
(Ss) = Bus + 181 + Sui, 
pu(S;) = Spa + 24pgp + 18uz + 36pm. 
These expressions refer to the 3-fold sample. More generally, for the r-fold sample, the 
numerical coefficients in the expression, for the Ath moment are obtainable by expanding 


(x, +x, + Xz... ra ae so that 7 replaces m (the number of basic terms) in (vi)-(viii); and 
we derive 


pa(S,) =r. Mo + ruiz . . . . . . . . . (xiii) 
paS) =r. ps H Sr us. p tre o . > i ; AV) 
a(S) = Tua + Arms «py + 3r + Grp ir. (AV) 


We can obtain the above by recourse to the method of iteration employed to derive expressions 
for the mean moments in 14.05; and we can derive the latter from them, if we recall that the 
expected score-sum of the r-fold sample is r times the expected score of the unit sample, i.e. 
that 

p(S,) = Tea 


By the now familiar relation between mean moments and zero moments 


m,(S,,) = pS) — 4p SAHS) + Bu S iS ,) — 3ui(S+) 
= a(S) — Arpa « (Sr) + Grey . pa(S,) — Spa 
= fu, — Ars « py + Sr Pé — Gr(r — 2)p2 . pi + 3r(r — 2) 
= r . m, + 3r må. 


The result agrees with the result obtained in 14.05; but the method would be laborious for 
higher moments. In deriving (xiii)-(xv) in this way, our object as stated in 19.02 is to explore 
a method which we shall now adapt in order to get similar expressions for the non-replacement 
distribution. To keep our aim in view let us again recall the general expression for the second 
zero moment of the 3-fold sample, vzz. : 


pS) = E(x?) + E(x) + E(x?) + 2E (x, . x2) + 2E(x, . x3) + 2E (xe . X3). 


If there is replacement we assume that E(x?) = u (the 2nd zero moment of the unit sample) 
regardless of the value of u which denotes the order of choice, and E(x, . %4) = 1.1 the co- 
moment of order (1, 1) has one and the same numerical value for the score product of any pair 
of unit scores which need not necessarily have different numerical values. We may then write 
as a particular case of (xiii) 


pa(S;) = Spo + 6u. 1 = Spe + Gui. 
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In 11.04 we have already seen that we can write 4, ., = pî, if and only if the replacement con- 
dition holds good ; but our examination of the non-replacement model in 2.07 has also shown 
us that previous extraction of an a-fold sample affects neither the mean nor the variance of the 
b-fold sample taken from the binary universe without replacement. Hence we have some reason 
to hope that order of choice will affect neither the moments of the unit sub-sample nor those 
of the co-moments in the domain of representative scoring without replacement. If so, we can 
obtain expressions like those of (xiii)-(xv) in the form 


MalSir)] = Tg tru. - i : ; ‘ z ; 3 beri (E) 
Kalsin] = Tus + 87 ugg + ru.. : . . . . . (xvii) 
Mal Scr] = Tp + 4r ugg + Sr uso + Gre 4.4 + Te a . (xviii) 


The task stated in the opening paragraphs of 19.02 is thus to show that 


(a) order of choice does not affect the value of the moments of the unit sub-samples or 
that of the co-moments ; 


(b) the co-moments are expressible in terms of the moments of the unit sample. 


19.04 SAMPLING WITHOUT REPLACEMENT IN THE FINITE UNIVERSE 
OF REPRESENTATIVE SCORING 


Whether we sample with or without replacement, we must regard our universe as an assemblage 
of like or different items identifiable as members of a class in virtue of the fact that we attach 
to each of them one and the same score value ; but if we replace each item before extracting 
another, it is immaterial whether we can also distinguish one member from another member 
of one and the same class. Contrariwise, it is essential that we should be able to do so, when 
our aim is to clarify the consequences of sampling without replacement. With that end in 
view, let us recall the definition of the score-sum in 19.02, where we write that of the 3-fold 
score sum as 


Ss) = X1 F Xa + Xg. 


In doing so, we label the unit scores (x1, Xə, %3) by a subscript referable to the order in which 
we successively extract them. If we extract them simultaneously, we may visualise x,, x, %3 
as score values of the first, second and third cards when we lay them face uppermost in a line. 
To say this is to say that the pack no longer contains the cards whose score values are x,, Xə 
and x3. More generally, we write the score-sum of the r-fold non-replacement sample as 


Sin = 2 Xu . . . . . 1) 
u=1 

If we do replace each item before extracting another, we may write the first four moments of 
the r-fold sample distribution of the score-sum S, as in the derivation of (xiii)-(xv) of 19.03, viz. 
pla) =r. Elp š E (ii) 
PS =r ERRADA o ; ; (iii) 
pa(S,) = 1. Eli) + 3r . E. x) +18 .El%,.% y) —- s . (iv) 

pals) = 7. E(x) +4r2 . E(x8 . x,y) +. 37), Elo? . x?) 
“+ Gr'®) El) Te. foe ty) 
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Now the coefficients of E(x*), E(x* . x”), etc., in the foregoing expression correspond to 
a summation of like terms on the assumption that E(x%), E(x® . x”), etc., have the same value 
regardless of choice-order, e.g. E(x3) = E(x) or E(x. xt) = E(x? . xz). We have to establish 
the truth of this conclusion before we can employ (ii)-(v) to evaluate the moments of the non- 
replacement distribution. 

In 12.02 we have in fact already shown that 


n? n 
E(x.) = p; E(x) = pos Eley. #5) = P yi? 


We shall now generalise the argument employed in 12.02 to establish the conclusion that 


(a) order of choice does not affect the mean value of the kth power of the unit sub-sample 
score when sampling occurs without replacement ; 


(b) order of choice does not affect the mean value of a co-moment of given order on the 
same assumption. 


First, let us be clear that x* is the kth power of the unit sample score at the uth draw, whereas 
xk . x” is the product of the Ath and mth powers of the component score of the 2-fold sub-sample 
consisting of an item extracted at the Ath and an item extracted at the mth draw. In the same 
way x* . x. x? is a score referable to a 3-fold sub-sample. Thus x? may be numerically equal 
et 1 x, ==, Of tO x, O A A, = x, but Ela) will not in general: be 
numerically equal to E(x* . x™~*), 

As in 12.02 our model for what follows may be a pack containing no picture cards. Having 
extracted from a pack of n cards an r-fold sample which we distinguish from those which remain 
by the choice-order subscripts 1 to r, we may turn the remaining (n-—r) cards of our model 
pack upwards in a row and label the score of each by its order in the sequence regardless of 
its numerical value, starting with x,,, and ending with x,. If we then denote the sum of the 
scores of this residual universe by Sin_ y) : 


Sua WMA o) > Moa ES io) 


On this understanding we can identify each item in the universe of n cards ; and we may denote 
the sum of all the score values as 


Bas, A) D> x, < (0) 
u=1 


In the same way we may denote as the sum of the Ath power of the scores in the 7-fold sample, 
the (n—r)-fold residual universe and the universe as a whole by l 


= k ae = S k e 
Sker) ee > Ky» DA > > Xyu > Sia a > Xu e © . (viii) 
u=1 u=(r+1) u=1 


In this symbolism, the Ath zero moment of the unit sample distribution is by definition 
1 À 
E “Sin ax DG): > i i 3 e S 
We shall now suppose that we have already extracted without replacement an (a—1)-fold 


sample, so that the sum of the Ath power of the scores in the residual pack of (n — a + 1) cards 
is in this notation Syim-a+1). The next draw is the ath and the score drawn is x,. If we have 
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HISTOGRAM FOR 2-FOLD 
THE CARDPACK . SAMPLE WITHOUT REPLACEMENT 


CHESSBOARD FOR 2-FOLO SAMPLE 


a a 9 
4 % 
ie Y: a: 


E MO ROH OH 
J PEREPERE 


SUM (X) 
MEAN (x/2) I5 2 25 


6 10. 12 
FREQUENCY 35 3 Ó 


1 


Fic. 124. Sampling without Replacement in a Finite Universe. 


The universe consists of 6 items to which we attach the values 1, 2, 3 definitive of the score class with relative 
frequencies 1:3:2, In drawing an r-fold sample, we can take any item once only, but we may draw more than 
one item of the same score class if the class itself contains more than one. To keep track of what we have taken, 
we distinguish members of the same score class by suit. The reader may complete as an exercise the chessboard 
and histogram of the 3-fold sample and thus show that the half-universe sample (F = 4) has a symmetrical distribution. 
The score sums (X) are 5, 6, 7, 8 with relative frequencies 18 : 42:42:18. For X = 5 and X= 8 there are 18 
ways of taking the combinations 1, 2, 2 and 2, 3, 3 respectively, For X = 7 and X = 8 there are 36 ways of taking 
the combinations 1, 2, 3 and 2, 2, 3 pestectively: and 6 ways of taking the combinations 2, 2, 2 and 1, 3, 3 respectively. 
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not replaced the sample of (a — 1) cards previously drawn, the mean value of x* is the mean 
value of the Ath power of the score from a residual universe of (n — a + 1) items, 1.e. 


Ed Sito -0+0] A (x) 


Ei) = EL) = Ss 


If we draw again, the score of the next card taken is x, = Xa+ and the sum of the Ath powers 
of the scores in the residual universe from which we take it is Sx¢n—aix) — %%, and this residual 
universe consists of (n — a) cards. For a fixed value of x,, we may therefore write the mean 


value of xf as 


ase % 
E, < alat) F we e 


We may now employ the customary grid operation : 
E(x) = Ey . E». a(x5) 
= Ed Sii n—a+1] pone E (x1) 
n—a n—a 


Whence from (x): 
ELS ‘ener E [Sx ol 
AS Pe ets Es ir E 
EN ae (n — a)(n—a-+ 1) 


e E(x) ae Ed Si n-an] Bs E(x*), 


n=a>+l 
ve E(x) = Exa), 
= Bey Say = pa í y i $ i ; AR) 


We have thus shown that E(x*) does not depend on order of choice. For the first term 
in (ii)-(v) we may therefore write r(u,). To evaluate an expression of the form E(xE . xf") we need 
therefore place no restriction on the value of b other than that it lies in the range 1 to r excluding 
a in the same range. On that understanding we write 

Be ah Bele, E. (a). 
In this expression E, . ,(x%”) is the mean value of xj" associated with a fixed value of x,, 1.e. the 
mean value of the unit score of a pack from which we have thrown out the card whose score 
value is x,. The pack so defined contains (n — 1) cards and the sum of the mth powers of the 
unit scores therein is (Smn — x7’), so that 
a n 


Sma — 
Bat) eee e 


n 1 
à k a oe k k+m 


: k my] — ds Mo a k+m 
. Eno Ev. a(x5)] o ym . E(%q) ees E(%q"”")s 


n F 
i E(x! . xp) = e ye - Mm — jee . : . . (x11) 
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We may write this in the form 
n? n ae 
Pr. m = yolk "Pm — yi?) Ukim * . . : . (ii) 
In the same way we may write 


ER A ES EDEN 


To interpret Ee. a(x?) we recall that we sample from a pack from which we have thrown out 
Xa and x,, so that 


a AO A oca j 
A E 
n 1 
EA AN = —— gto: Ea . By. ala 99) — ¿ga En. alta"? - 5) 
so EIA) 
PO i b.-a a? b e 


In this expression 
E, Es dt == El E CMR 
Whence from (xiii) 


n3 n? n? 
Pr. m.» = nit Pr - Um + Mp — re Prem Pp — pis) Pr+p - Pm 
n? 2n ; 
T n) H k Pm+p + nt) H k+m+p (xiv) 


In the same way we derive 


n A 
sy k m Prr ERER k+q m p 
PMem.v.a — apta» Blu + Xho > Xian) a aE (Xou » Cho + Cho) 


A aan 
n e 


1 
ar, — ae a at) 


3 


n 1 1 1 
El pad Ha. Hk. m. p — y —3Pe+0.m.» ee y —3P*.m+a.D se y — J tm. ote 


Hence from (ziii) and (xiv) we derive 
1 
Hk.m.p.a — (n — 1)® [n3u kml ol a — n? {Mol e T P + Plalev+m 


+ Pmblalticio F Patimblo+ a + Prbiobim+a F Hmhoht+ af 

+ Mbp + berm + B+ aber + Pres bom 

+ 2puP0+a+m + 24mPreso+a + 2PoPrrsm+ a 

+ 2potosms ry — Órimioral > . . : . (xv) 
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From (xii), (xiii) and (xv) we thus derive the following co-moments : 


ae n 
pa. ; ee yar << nore? ) 
n? n 


Pai = niet a TT > 


qn n 
H2.2 hi nore SER nies ? 


n” n 
Pia pa > Mg == ya ) 


1 
Pa... = (14 — Sn? papa + 2n. pals 
Jt 347 2 2,2 : 
M2.1.1 213) [nuia — 2? uz — 1% pz + 2n . pa] ; 


1 
Praia = Fale — Gn? pops + Sn*po + En*p ua — 6n . pal. 


The reader will note that every term except the first in the above expressions vanishes when 
n is indefinitely large, so that the replacement condition is irrelevant and pu., v = Mu. Hv, Etc., 
as when replacement occurs, e.g. 
| Micra pá. 


We now recall the general expressions for the first four zero moments of the r-fold score-sum 
non-replacement sampling distribution in terms of the zero moments of the unit sample dis- 
tribution and of the above co-moments, vtz. : 


Halse) = TH; 

HS) = Tes Er 

Balin) = THs + SIP pa + rM.. 

pa Sey) = rpa + Arg, + BrP pa a + Grp 1.1. + 7 aa 


Hence we have 


a) 
LaS) = r| us + a np = pa) 


n— 1 
r 


= le — re Foa. 


If m, is the variance of the unit sample distribution, the corresponding mean moment of the 
score-sum distribution is therefore 


MaS») = ln — 1) pte + {alr — 1) — v(m — Då), 


a a) 


n— 1 
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Similarly, the third zero moment of the score-sum distribution is given by 


y(3) 
HlO) = THs Ta Ps — Ps] + aa pi — 31 . papa + 2ps] 


= dr —1): 2Ar—1)\(r—2)) Sarl yap, oes 2 
= m1- Ho I al “a a +" Tan 


SN eb 3n(n — rr? y ?2y(3) 
= A hs "epee E CTE ES parr 


Whence we derive by the customary conversion formula 


r(n — r)(n — 2r) 


Ma Sir)) ace (n ee. Na [ua ese Spile T 2p31, 
` Ma Sir) = A ee: (xvii) 


In the same way, we get the fourth zero moment of the 7-fold sampling distribution of the 
score-sum 


Ha Scr) = THa dee a — pal ie [mud — Ha] 
Gr (3) : 
q (n — jal pipa — 2,3 EF Npa, a 214] 


y ) 
a eT ut — Gn + Bmp + Saco — Sua] 


r(n — 1) 
= = 2r)(n — Sr) — n(r — 1)]pa 


$ pal — 2)(n — 3) — 3(r — 2)(n — 3) + 27 — 2)(r — 3) | pape 


Entra — 7), nar 
me rae a 
(n — 110 (n — 1) 


+ pole- 2) (8 — 3) — lr — 2) (1 8) + — 2) (¢ — 8) 
oe pee [(n — 2r)(n — 3r) — n(r — 1)lua 
4n(n — rin — 2r + 1yr‘? Entran — r) , 
E y A 


ny'4) on. rn —T te =T 
Fa par (n — yO ds 
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The derivation of the corresponding mean moment is elementary but rather tedious, and it 
will suffice to give the result 


n — r)[(n — 2r)(n — 3r) — m(r — 1 on. ran —r)in—r—1) , bid 
Ma(Sqr,) igs s Ji (n Xe pa E n, Sa - k ae ma a (xviii) 


From (xvi)-(xviii) we obtain in the usual way the first two Pearson coefficients of the r-fold 
score-sum distribution in terms of the Pearson coefficients of that of the unit sample, viz. 


(nm — In — 2r} : 
Br Sir)) = th ea A i i i ; i ; ; A 


B(S) = z alte — 2r)(n —3r) —n(r —1)}B, —3n(n—1 —1)] +n (xx) 


If n is so large that we can neglect 3n71, we may simplify as follows the above by the substitution 
of the sampling fraction F = rnt: 


(1 — 2F)2 


B(Sin) = ae) By... ; : < {Xx 
= sE 3 ES 
Pal Sir) = 3 + A see . (xxii) 


The first expression vanishes if F = 4, suggesting that the’ distribution becomes symmetrical 
even if the parent universe is skew. We have already seen that this is so of sampling without 
replacement in the taxonomic universe of 2 classes. For the more general case under discussion, 
the symmetry of the distribution when F = } follows from elementary principles, if we bear 
in mind the fact that each combination of a out of letters corresponds to the same number, 
i.e. n%, of permutations. Hence the frequencies of score-sums will be in the same ratio as the 
numbers of combinations of items whose total score is the same. If a = $n, there will be a 
unique combination with score-sum S, — Sa, for each unique combination with score-sum sa). 
Thus scores Of Sta and Sp — Sta) Will occur with equal frequency when a = $n. The mean 
score-sum is then 4S, and the deviation of Sia) therefrom is (Sa, — ¿Sn) = + Sta) That of 
each corresponding combination whose score-sum is (S,, — S(q)) will be 


(Si, pee: pee $san) = (ES o Stai) ROR Sia): 


Thus score deviations of + Sta, and — Sia) must occur with equal frequency. 
When F = 4, we have 


Sm) =0 and paso) =3- 62. 


. (xxiii) 


The expression on the right indicates that the distribution is necessarily platykurtic when F = 3, 
as is true of the symmetrical Type II distribution. If we substitute for j in (xx) of 15.04, we have 


a — Ba=3) 
a pea 


How closely this conforms to the restrictive condition mentioned at the end of 19.01 evidently 
depends on the character of the unit sample distribution. More generally, B,(S(-)) vanishes 
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for all values of F if 6, =0; and P(S) may be greater or less than 3. If F is small and x 
is large, (xxi)-(xxii) approach the limit for the replacement set-up, as we should expect, i.e. 

1 — 3 

Bi(S(r)) = ¿A and B(S) =3 + - 


Y 


24-fold Symmetrical Universes of 3 classes. 


4,= I B= 1 4z\-33 

X "p A “| O +i hl 

Nos u 2 i IO 4 10 4 25 
Az \5 Br 17 /3= 2-0 

X -O+ -1 0 +I -| O +l 

Nos. 8 8 T07 2 6 
B =24 3,=30 43=4-0 


= 
— 


X -| O +l -| O +l -| O +l 
Nos. 5! 14 5 4 16 E Ea 
23,=6'0 2 =12-0 a 
| i 
a 
kee 
| | 
| | 
| | 
| | 
| 
E ADE 
X -| O +l -| O +f -| O +l 
Nos. Cn 2 eera © <4 0 


Fic. 125. Symmetrical 24-fold Universes of 3 classes with kurtosis (B2) coefficients. 
Note the scale is not uniform throughout. 
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3-CLASS UNIVERSE OF 24 ITEMS Q 
| 
| I 
| : 
i ! 
i 
i 
i I 
i I 
i í 
\ í 
1 I 
i 
i H 
; 
| 
¡ 
| 
o b 
o $ 
: . O 
‘. \ Ó = 
|` a Aa. 
"Oo... `“ ae eS A i $ A 
Ly ee" Bes HIN E 3 a ooe E A Qo 
a ae a. 0 
el, o n oo ge : 
6) o 
| 
T 2 4 6 8 IO 12 12 16 18 20 22 
AO LA A A OR E I 
12 6 4 3 12 2 12 3 4 6 12 


Fic. 126, Variation of kurtosis coefficient (B,) with size of sampling fraction for 24-fold universes 
of 3 classes shown in Fig. 125. 
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For small values of F, we therefore expect that the score-sum distribution will be leptokurtic 
only if that of the unit sample is also ; but if F is large, the score-sum distribution may 
be leptokurtic if that of the unit sample is platykurtic. In the limit, of course, Ba(S(r,) becomes 
infinite, since only one value (S,,) of the score-sum is consistent with exhaustive sampling. 

A closer examination of the approximate formula (xxii) brings into focus what is perhaps 
a more remarkable feature of non-replacement sampling distribution than the symmetry of the 
half-universe sample. The expression 1 — 6F(1 — F) vanishes when F = 4 + 1/V12, i.e. 
F = 0-22 or 0-79 between which limits the coefficient of 8, in the second term of (xxii) is negative 
with a numerical maximum for F~0-59. Regardless of the structure of the universe, the 
sampling distribution will always be platykurtic unless the sampling fraction in round numbers 
is greater than four-fifths ; ceteris paribus, below this level a highly leptokurtic unit-sample 
distribution will generate a more platykurtic sampling distribution than a distribution which 
is initially flatter than the normal curve, e.g. a rectangular or indeed even a U-shaped one. At 
F œ 0-59 the kurtosis is a minimum and (xxii) is approximately 


¿3 +1-1f, 


E 


3 


This seemingly paradoxical characteristic of any non-replacement distribution comes into 
focus (Fig. 125) if we calculate from the exact formula (xx) the kurtosis for samples of different 
sizes extracted without replacement from a symmetrical 24-fold universe of only 3 score classes. 
For simplicity, we may assign to the 3 classes scores of — 1, 0 and + 1, and frequencies (pa, 
Pos Pe) as below with PB, values for the unit-sample distribution in the range 1-12 including 
a rectangular and a U-shaped contour at the lower limit of kurtosis. 


w 0 a4 Bo er 0 +1 Ba 
1 21 1 12-0 5 14 5 2-5 
2 20 9 6-0 6 12 6 2-0 
3 18 3 4-0 8 8 1:3 
4 16 Eon 3-0 11 2 11 1-] 


The picture disclosed by Fig. 125 raises the question: what lower limit may f,(S;,)) attain if 
Ba is as high as may be ? This admits no simple answer because the size of the universe itself 
sets a limit both to the maximum value of f, and to the value of r consistent with the condition 
that the second term in (xxii) is both negative and numerically maximal, as when F œ 0-6. 
Thus a universe of 100 items assignable to 3 equally spaced score classes (— 1, 0, + 1) as 
above cannot have a kurtosis greater than 50, and the sample size consistent with a minimum 
value of Ba(S(,)) is about 60. 

For the rectangular universe 8, =0 and f œ 1-8 when n is large. The possibility of 
generating a rectangular sampling distribution therefore implies that 3 + 1:18, = 1:27. If 
n = 200 and F = 0-6, 7 = 120, so that B, could satisfy this relation only if B, œ 130. For 
the 200-fold binomial universe defined by (0-995 + 0-005}, PB, exceeds 130 but no other 
binomial 200-fold universe and no 200-fold universe of more than 2 non-zero classes can 
satisfy the condition fz >130. From a 2-class universe of 1 zero score value and 199 unit 
score values the value of Ba(S;,)) for the 120-fold sample would be 1-12; but the sample itself 
would contain only 2 score classes (viz. score sums of 120 and 119) as we see by expanding 
(199 + 1)42%, Though (Sı) is in this case less than 1-8, the distribution of the sample 
score is monotonic. 
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From (xix)-(xx) we see that the first two Pearson coefficients of the r-fold and the (n — r)- 
fold sample are respectively identical. Thus B/(S(,) = P, and Ba(Sim) = Pa when r = (n — 1). 
If B, lies in the neighbourhood of 3(n — 1) + (n + 1), the kurtosis of the r-fold distribution 
does not appreciably change within the range r = 1 tor = n — 1, e.g. when n = 24 and B, = 2°76 
(Fig. 126). 


19.05 DIFFERENCE DISTRIBUTION FOR NON-REPLACEMENT SAMPLING 


To derive the distribution of the difference between the raw scores of an a-fold and b-fold sample 
from the same finite universe we must retrace our steps to the derivation of (ix)-(xii) in 19.03. 
We there considered the form of the terms of the expansion of a multinomial expression such as 
(p+qtr-+s...)" containing m basic terms p, q, 7, etc. all positive. For the (a + b)-fold 
score-sum the expressions K; o, K;.1 etc. are obtainable by inserting (a + b) form. We may 
write the raw-score difference in the form 


E a‘ te 


= (X.a “+ Mag «<< T a 7 o a — Xr»), 
whence the kth zero moment is ; 
(a — ox == E(x,. a — Xo a ~ . o aai Xi. b ET Xo. b . o A e 


The expression on the right has a positive and b negative terms within the brackets ; and we 
may classify the terms of the expansion as in the derivation of (ix)-(xii) of 19.03. The coefficients 
of corresponding classes will not be identical with K 9, etc. Accordingly, we shall label them as 
H, ,, etc. The reader should first note that the correct interpretation of (a — 6)? is consistent 


with Vandemonde’s formula, if we write it as 
k=r 


(a = db) = > rpa (— b)», 
k=1 


This would be strictly analogous to the ordinary binomial expansion (a — 6)" if it were 
true that (— b)*) —(—1)*.b™. For brevity, it is convenient to define by use of square 


brackets an expression which precisely conforms to the analogy, viz. : 
k=? 


. (a — by E EDI gu pe, 
k=1 


The reader may check that the pattern for the H coefficients is as follows : 


Hz o =(a +b), Hao =(a +b), 

Hı = (a — b)”, fe Pa =4A(a — b)”, 

H;.o =(a — b), AA, = 3(a + b)”, 

H, , =3a'?) — 3b", Ho... = 6[a'® — ba? — ab! + b11), 
A, .1.1=(4 — by, Ay 4.1.1 = (4 — by". 


We may then derive, by recourse to the H coefficients defined above, compact expressions 
for the moments of the raw-score difference distribution in terms of the moments of the u.s.d. : 
a-t = (4 — b)ux, 
a-o = (a + b)ua + (a — du, 
unta = (a — bje + Ia" — b™ pa . wa + (a — dz, 
a ata = (a + b)pa + Aa — b) Pug. p + Ba + b) 9 yg + 
Gia — ba’? — ab?) + bus : pa ce (a E Dt. 
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We thus derive the following expressions for the first two Pearson coefficients of the raw-score 
difference distribution in terms of those of a u.s.d. referable to n items : 
(a — b)? (n? — 3n(a + b) + 2(a — b)?}?*(n— Da 

{n(a + b) — (a — b)’ (n — 2)? A 
(n — 1)[s(n — s){n(n + 1) — 6s(n — s)} + 16abín(n + 1) — 3s(n — s) — Sao 

(n — 2) V(s(n — s) + 4ab}? 
In [sn — s) + Sabís(n — s) + 2(ab — n + 1)}] 
: (n — 2) {s(n — s) + 4ab}? 


(a—v) Py = 


(a—v) Ba = 


in which s =a + b. 


Both expressions simplify greatly, if we choose samples of equal size (a = b), in which event 


(a—b)B1 =0 
and 
(n — 1) 
(a—v) Be — Qan(n — 2)®) {n(n — 6a) + n + 6ajB, 
AE - — 1)(n — 2a)'®) + Sa(n— a) — 8a(n — 1)} 
2an(n — 2)? : 


If a =n(n + 1) ~ 6(n — 1) =b, it is thus apparent that the difference distribution is 
symmetrical ; and the value of the second Pearson coefficient is independent of f,, i.e. of the 
structure of the universe. On substitution of this sample size in the expression above we find 
that (._»)B. reduces to 3(n—1) — (n + 1); but the interpretation of this result is meaningful 
only within the framework of the assumption that both n and a must be integers. Evidently 
the coefficient of £, will be small if n — 6a, i.e. each sample is a one-sixth fraction of the universe 
of choice. For large values of n we may thus say that an overall sampling fraction of one-third 
will ensure that the kurtosis of the difference distribution is independent of the kurtosis of the 
u.s.d. More generally for n =6a, (4_»)B, reduces to 


6(n — a ee e(n — 1)?(n? — 6n + 6) 


n® > n® 


The maximum finite value of 8, occurs in the binary universe, the frequencies of the classes 
being 2+ and (n — 1)n”* respectively, one class being then represented by only one number. 
The second Pearson coefficient of its u.s.d. is (n? — 3n + 3) — (n — 1). This is its maximum 
value; and the maximum value of (,_5)f, is therefore exactly 3. Thus the difference dis- 
tribution is necessarily platykurtic and the greatest contribution which can be made by the term 
involving fa is 6(n? — 3n + 3) + n(n — 2)?. The table below shows for various values of n 
the values of the two terms in (,_»)8, assumption that £, has its maximum value, as above. 


Ist 2nd Ist 2nd 
= term term 6% term term 
6 1:75 1:23 30 0:21 279 

12 0°62 2°38 42 0:15 239 
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Even if the u.s.d. of the binary universe is very platykurtic, we therefore see that samples 
of size equal to ¿th of the universe will generate a symmetrical difference distribution having a 
second Pearson coefficient greater than or equal to 2:S if n is greater than or equal to 30. For 
any universe of 30 or more items, regardless of the number of score classes and of items in each, 
there is good reason to assume that the first two Pearson coefficients of the distribution of the 
difference referable to equal samples of 4th will lie close to their normal values. However, 
this does not suffice to justify the conclusion that the normal curve will give an adequate 
quadrature for the sample difference distribution. An examination of how gratuitous such an 
assumption may be will indeed give us some insight into circumstances which guarantee a 
good fit. 

In particular, we recall the case of sampling from a 2-class universe. Without restriction 
on the values of a and b, the difference distribution is then definable as follows for a u.s.d. of 
score values differing by unit increment 


Difference Scores. — 1 0 +1 
a n—a—b b 
Frequencies. - AE A: x 
n n n 


When a =b and F = (a + b) — n is the total sampling fraction, this reduces to 


Difference Scores. i l — 1 0 +1 
l F F 
Frequencies i i : J 1 — F 3 


MS 8, ==0 and ¿(nf Babia ==b amd F = 1. 
The difference distribution is then a special case of what we have elsewhere called (14-05) 
the burette universe, viz. : 


+ 


Score ; ; : ' i A 


Frequency L 


wh O 
Ol ER 


We may here make use of results obtainable (14.07) from sampling in the burette (infinite discrete 
3-class) universe by stating at this stage without proof the following conclusion : when the 
first two Pearson coefficients of a distribution involving 20 score classes are very close to their 
normal values, we may confidently invoke the normal distribution for purposes of quadrature 
adequate for statistical usage. It is therefore immaterial to examine the implications of the 
foregoing formulae for (,_»)8; and (.-») 82 more closely. It suffices to state of any finite 
unimodal universe that : 


(a) the first 2 coefficients of the non-replacement difference distribution w.r.t. a-fold and 
b-fold samples will lie very close to their normal values if both the following conditions hold 
good : 


(i) the sample sizes are equal (a = b) ; 
(ii) the total sampling fraction (F = (a + b) ~ n) is in the neighbourhood of one-third ; 


(b) the normal curve will then give a satisfactory quadrature if the distribution of the a-fold 
sample is referable to at least 10 different score values. 
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19.06 THe So-CALLED CHI-SQUARE STATISTIC 


In Vol. I we have sufficiently clarified the distinction between two methods of scoring respec- 
tively referred to throughout this book as taxonomic and representative. When we score by 
the former method we specify a sample by the number of individuals in each of an exhaustive 
set of exclusive classes. When the classification is binary every individual belongs to class A 
or to class B. If there are a individuals of an r-fold sample belonging to class 4, there must 
therefore be (r — a) = b individuals in class B. Conversely, a=(r— b); and one score 
suffices to define a sample of known size. For example, it is immaterial whether we specify 
a 12-fold sample of peas classified as green and yellow by the fact that it contains 5 green or 
7 yellow. 

When our concern is with more than two classes, this is not so. If there are N classes, 
our knowledge of the r-fold is not complete unless we can specify the score of (N — 1) classes. 
For instance, we can exhaustively specify one and the same flock of 25 Andalusian fowls classified 
as white, black and blue in three different ways, viz. : 


(1) (11) (111) 
White 12 White 12 Black 7 
Black 7 Blue 6 Blue 6 


To avoid periphrasis, we may speak of a classification involving more than 2 classes as 
manifold. The problem we shall examine in this chapter is the correspondence between hypo- 
thesis and expectation in a manifold system. Thus we might wish to know whether the com- 
position of the 25-fold sample cited above is statistically consistent with the requirements of 
the Mendelian ratio: 1:2:1 for white, blue and black respectively. 

We can, of course, specify the probability (P) of getting a sample of a given composition 
by recourse to the multinomial theorem in ordinary (replacement) or factorial (non-replacement) 
powers. For our Andalusian flock the data are 


White Blue Black Total 
Observed numbers . å e | 6 7 25 
Unit-sample expectation . 4 $ 4 1 


On the assumption that the universe is indefinitely large, we consider that we are sampling 
with replacement and put 


20! 
P = Tore O "Gy 


In dealing with a 2-class system, we commonly specify unit sample expectation w.r.t. 
choice of an item of class B as q = (1 — p), that of a choice of a single item of class A being p. 
In a manifold system of more than 2 classes, no such unique relation exists between the unit 
sample expectation w.r.t. class A and to class B. Accordingly, we shall write the unit sample 
distribution for a system of N classes as 


Pith tA: +Pn 


In this expression p, is the unit sample expectation w.r.t. class A, and (1 — Ppa) = qa is the 
expectation that a choice of a single item will not belong to class Æ. Where occasion arises, 
we may write q, = (1 — P»), qe = (1 — Pe), etc. For a 3-class system therefore 


P= — pf, — Py) and qe = Pa + Po . . . . (i) 
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When our concern is with the expected, i.e. mean, value of a score referable to one class alone, 
the classification involved is essentially binary. For instance, an item either belongs to class C 
or does not belong to class C. If f (b) is any function of the raw-score distribution of the B-score, 
we may thus write its expected value as for a binomial variate, viz. : 


v 


E. f(b) = 2. Jb) e p pigs” . re 0) 


In particular, we may write 


b=r | 


A A 
Eb — MN AL bn o e a 
E(b?2) = r . pl — ps) + rep? = r . Dido + 1*p5 ; E 


In what follows we shall thus adopt the following schema for a 3-class system : 


Class A B G Total 

| Unit Sample Ex- 

| pectation Da Po Pe (Da + Po + Do) = 1 

| Observed Nos. a b c (a+b+c)=r 

| Expected (Mean) 

| Nos. M; = D, M, = rpp M= rpi (Ma + M, +M) =r 
| Score Deviations u = (a — M,) v = (b — M,) w = (c — M,) (utv+tw)=0 


In the symbolism of the foregoing schema, the probability (Pave) of getting the particular 
set of score values a, b, c defined above is 


(a) With replacement: 
r! 


T = HA Ds ; p De ; ; ; . (v1) 

(b) Without replacement (from a universe of n items) : 
r! (apa). (np) . (np) . 
A ee 


In practice, we must assume that the choice of any one such set of score values is trivial 
if r is fairly large. What we want to assess is the overall probability that the score deviations 
u, v, w will not be excessive. With this end in view our task is to devise a manageable statistic 
which brings score deviations of all the constituent classes into the picture. We can get a clue 
to the fulfilment of this aim, from the customary procedure for assessing the expectation of an 
excessive discrepancy between hypothesis and expectation, when our concern is with only two 
classes. Our system is then : 
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A B Total 
Unit Sample Expectation Da : Do (Da FP) =1 
Observed Nos. : 3 a b (a+6) =r 
Expected Nos. : : Ma = "Da M, = tør (Ma +M) =r 
Score Deviations . ; u = (a — Ma) v = (b — M,) u+yo =0 


The statistic commonly prescribed when our taxonomy is binary is the standard score here 
denoted as cz, Except when either p, or pẹ is very small, its replacement distribution for large 
samples is approximately normal with unit variance, its square (C) being then approximately 
a Chi-Square variate of 1 d.f. We define it by the equivalent alternative relations 

u y 
Waa Po 
The identity so stated depends on the following relation which suggests an alternative definition 
of C, involving both u and v: 
ee 3 
1 
. (a — Ma)? = (b — Ma); —+—= : 
ae ee - E z TPaPo 
u? u? u? u? ye 
ple A A O e irs 
TPa TP» IPaPo Pa 1P» 
a— M.)  (b— . 
1 Cys a dsd Ma) mm A rn i : A ; wie (Ax) 

Since the binomial statistic C,, elsewhere denoted c?, is an approximately Chi-Square 
variate of 1 d.f., its expected value (first zero moment) is unity, as we see from the following 
considerations. The expected value of (a — M,)? = u? and of (b — M,)? = v? is the variance 
of the raw-score distribution, i.e. 

E(u?) = Thap = Elv?), 

E(u E(v? 

BIC) = Et) 4. EO? 
Pa TP» 


If we write a = x, and b = x, to make (ix) adaptable as a particular case of a more general 
expression, it takes the form 


(viii) 


=p, +p.= L 


Cy = >= Goes wal 


pee as 
s=1 s 
This suggests a statistic which we may define for a system of N classes as 
‘a (x, — M3) 
Cy = A è $ . . E 
N 2. M, (x) 


It is easy to see that the expected value of (x) is N — 1, i.e. that of a Chi-Square variate of 
f = (N — 1) degrees of freedom for all values of N. This follows from the fact that any manifold 
system can be regarded as binary, w.r.t. any one class. Thus (1 — p,) is the probability that 
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a unit sample will not belong to class S, if p, is the probability that it will do so ; and rp,(1 — ps) 
is the expected value of the square score deviation (x, — M,)?, i.e. the variance of the distribution 
of the S-score. Hence we may write 

E(x; — M,)? = rps(1 — p,). 
Since M, = TDs 


s=N s=N 
EC) A 
By definition e ds 
(Da + Po +Po-- - py) =1, 
“. E(Cy)=N=—1. i ; 5 e) 


Thus the statistic C, of the binary system is a particular case of a more general pattern 
which takes account of score deviations of all the constituent N classes, and the expected value 
of this statistic is that of a Chi-Square variate of N — 1 degrees of freedom. For the 3-class 
replacement system whose general term is (x) above 

u? v? w? . 
= M, + M, + M, ° . ° . e (xii) 
In this case, N = 3 and (N — 1) = 2 = E(C;); and we shall later explore the possibility 
that C, is in fact approximately expressible as a Chi-Square variate of 2 d.f. The procedure 
will make it sufficiently clear that the rule suggested holds good when N > 3. 

It will clarify our task if we first re-examine the implications of the statement that the 

square standard score deviation (C) of a binomial distribution has approximately the distribution 
of Chi-Square for 1 d.f. We have previously arrived at this conclusion by the following route : 


Cs 


(i) The distribution of the score deviation (a — rp,) = u tallies closely with that of a 
normal variate with variance rp,(1 — pa) for large values of r unless pa or 1 — pa is 
small compared with the reciprocal of r, i.e. for large values of r and rp, > 10; 

(ii) Subject to the qualifications stated, the ratio (c) of u to Vrp,(1 — pa) is therefore 
approximately a normal variate of unit variance ; 

(11) Since the square of a normal variate of unit variance is a Chi-Square variate of 1 d.f., 
the ratio of u? to rp,(1 — Ppa) is also approximately within the framework of the same 
qualifications a Chi-Square variate of 1 d.f. 


Let us be clear that we are not speaking of the exact distribution of C, = c? in these terms. 
Accordingly, we might regard the problem as that of finding a good fitting curve for it by the 
method of moments. Now we have seen in 13.01-13.02 that the following relations hold good 
for the mean moments of the normal variate (c) of unit variance, and the zero moments of the 
Chi-Square variate (C) for 1 d.f.: 

mkC) = malo). 


In this expression, we have seen that m,(c) = 1, m,(c) = 3, m,(c) = 15 and m,(c) = 105. Hence 
fe == 1; (COC) =— 3S; 2 (C)— 155 0) = 105. . (xiii) 


If C, defined by (ix) above is a statistic for which we seek a fitting curve, we may proceed to 
determine its moments as follows : 


SE a l ee res 28.03) 


ce A, Bice. ck ras eee. 
mC) = BC) = BC + 7) = AD e A BE) 
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Since u? = v? in a 2-class system, we may here put 


E(u’) 


E(u*)\(M, + M,y? 
cy = EO, B(M, + M,)* 


(Mi + M5 qe 2M,M,) Es M2M? 


In this expression E(u*) is the mean value of the 4th power of the raw-score distribution of the 
binomial distribution, i.e. its 4th mean moment (m,). Also 


(M, + M)? =r? and MIMP=ripill — pa), 


: (Ma + M,)? ae l Pie de 
EMM, > PE) "E 


Ma 
wre E = AD == . 
pl 2) m? Ba 


In 14.05 we have obtained the value of f, for the r-fold sample from a universe whose unit 
sample distribution is (q + Pp)’, piz.: 
S -a 
rpq 

Evidently, the second zero moment of the C, distribution tends to 3 when 7 is large and the 
reciprocal of either pa or pẹ» is small in comparison with z. In the same way, we may see 
that (C) and 4(C,) approach the values of (C) defined by (xiii) above. The numerical values 
1, 3, 15, 105, for the particular case f = 1, i.e. the Chi-Square variate of 1 d.f., illustrates the 


more general rule : 


m=f; po= (442); ws =F +2) (Ff + 4); 
pa = f(f + 2) (f + 4) (f + 6), etc. 


The statistic defined by (xii) is referable to a 3-class system, and the hypothesis we are exploring 
is that its approximate replacement distribution for large samples is that of a Chi-Square variate 
of (3 —1)=2d.f. If f= 2 in the above 


pi = 23 pe = 83 ps = 48; fy =H SH. : ; ey) 


We have already seen that u, = 2, when N = 3. We shall now explore the possibility that 
the limiting values of ps, etc., for large values of r conform with the above. 

Before proceeding further, we may pause to refer to an ambiguity of current terminology. 
Pearson developed the theory of the distribution referred to as the Chi-Square distribution 
on the assumption of continuous variation ; and it is in that sense that we speak of a Chi-Square 
variate elsewhere in this volume. The exact distribution of the function of sums of squares 
defined by (x) is necessarily discrete. Accordingly, it is misleading to speak of Cy in (x) as a 
Chi-Square variate, and a fortiori misleading to define a Chi-Square variate as a sum of squares 
so weighted. What we can say is what we have already found reasons for suspecting, viz. that 
the Chi-Square for (N — 1) degrees of freedom is a good fitting curve for the sampling distribu- 
tion of Cy when r is very large. Only on that understanding can we use the table of the appro- 
priate Chi-Square integral with propriety to evaluate the expectation that Cy will exceed a certain 


numerical value. 
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19.07 THe MOMENTS OF THE SO-CALLED CHI-SQUARE STATISTIC 


We may write the statistic defined by (xii) in 19.06 in the form: 
—M,)? (b=My?  (c— M.y? 
(a ) + ( ») > ( ) 


aS M, M, M, 
ae ee M,+M,+M 
Mi a a ee) UM, + Mi + M) 
a? b? c? 
MAMA f, 
b2 c? 
+= tap +a 


Whence we may put 


a? 2 2\2 
E(C, +7)? = $ ae 7) 


Me. M, EM 
E(a* E(b* E(c* 
= BCD + EC) +r = TO 4 E AO 


2E(a*b?)  2K(a*c?) a 2E(b*c?) 
MM, — MM, MM, 


In the preceding expression, E(C;) = 2, being the mean value of C, as already shown. 
The expression E(a*) is the 4th zero moment of the A class score, and we may write it accordingly 
as pla). For brevity we may also write 


E(a?b?) = M2.2.0> E(a*e?) = M2.0.23 E(b*c?) = Mo.2.2- 


Thus the foregoing expression reduces to 


| pala) , palo) | palce) , 2.2.0 , 2.0.2 , 2Mo.2.2 > E . 
AO. me T YC 


The hypothesis that the Chi-Square distribution for 2 d.f. is a satisfactory fitting curve for the 
3-class statistic C, requires inter alia that 


A 
To evaluate the variance of the distribution of C, as defined by (i) it is necessary to find 
expressions for us 2, q etc. More generally, the evaluation of higher moments presupposes 
that we can find expressions for 
nage O Oe). 


Since we can express c = (r — a — b) in terms of a, b and r the fixed size of the sample, we 
can always transform co-moments of the above form to the simpler pattern illustrated by the 
following : , 
2.2.9 = Ela*b*(r — a — b)”] 


= Ela?bYUr? + a? + b? — 2ra — 2rb + 2ab)] 
= r*E(a?b?) — 2rE(a*b?) — 2rE(a?b%) + 2E(a3b*) + E(a'b?) + E(a?bs), 
"< Poris. = T Ua. a.o — 2B3.2.0 — 2.3.0 + ir Pi =: (111) 
It will thus suffice for the purpose of evaluating moments of any order, if we define ik ee 
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We can do this directly by recourse to the grid symbolism of 11.01-11.04 employed in our 
treatment of the two card-pack model of 12.03; but it will be instructive if we also perform 
the operation by recourse to first principles. 'The general term of the distribution which defines 
the r-fold sample frequency of the particular score values a, b and c is 


7! a b c 
a! b! c! Pa + Po + Pe 
If we write pa = (1 — qa), 


ty OANA 


and .¿=y% 
Ta de de Ya + da = 4 


1 


Hence we may put 


; r! ae S r! a,r—a ES a)! e (2:) ‘ 
alb! cl Pa -Pu- Ps = al (r Za)! 74s biel a = B : (iv) 


By definition, the mean value of a’b’ is given by 
a=r b=r—a 


<=- o 
E(db)= Y > 80. 3 Pe Pe- Po 


a=0 b=0 
Whence by (iv) 
a=fr ! b=r-a A b c 
Apiy — h i: a,r—a re (2) (2) 
D a PTE PL ve b Ale : ==) 


In (v) above, we may write 


Ls ANS ; c 
Pe prm and 1— pra = gra = ELE 


qa Y a Ga 


Since (r — a) = (b + c), the general term of the binomial (goa + Poa) * is 


(r — a)! , _(r—2)! (po (DoN' 
Bray Pen a = Fret Gal eee 


Thus the second factor on the right of (v) is the (r — a)-fold sample weighted mean value of b 
when the unit sample expectation of extracting an item of class B is Psa. It is therefore the 
ith zero moment of the distribution defined by successive terms of the expansion of (qoa + e .* 
We may write it therefore as p(b,). In particular, 


(ba) P (r E a)P oa ; 
Halba) = (7 — a)Poa Qoa + (7 — 4)? Pia 
= YPoa Ina — Poa (Goa + 21Pra)a + Pia (a? + 17). 


* In the symbolism of 11.01-11.04, used elsewhere in 12.03, the operation illustrated by (v) is equivalent to 
writing 

Hn. i. o = E(a*b*) = Eala” . Evalb5]. 
The operation Esa(b*) here signifies taking the mean of the ¿th power of b for a fixed value of a. Hence it is the 
ith moment of the (r — a) fold distribution for a residual universe in which the proportion of items of class B is 
ba. It is evidently immaterial which way we write 


E,[a” . Ey(b*)] = En. t.0 = E,[b* . E,y(a”)]. 


Sa NO 


a ae 
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By substitution in (v) we thus obtain 
E(a*b?) >. Dia P TP 507 va >. a; habe — Pral Iba + 27P ya) de ey 
fag r=0 


+ Pia = Chay PG > 


ae Dña ais 1PoaQoa - o(a) S PoalQoa + 2rP oa) x pla) P Pha - pula). 
Alternatively we can express the general term of the binomial in the form 


| — b)! a : ars À 
h! (r — b)! G HiS: E al A (2) (E) = "hogs (1 — DimPadi 


b! (r — 
Whence we may also write 
E(a?b?) = Pi, + Tar - av - Ha(b) — Dailan + 21Par)ualb) + Pao - Halb). 
Hence we have 
2E(a?b?) = (pra + "Prados ala) + (Pan + TPardoa)Ho(b) — PoalQoa + 21P0)pus(a) 
— Patlar + 21Par)ualb) + Poa» Pala) + Pas - Halb). 


In this expression 


Pa De 
rn O =1— te. eee 
Pa m E 


Since M, = Tpa and M, = rp, we derive 


2E(a*b*) mee My a (Pe + 2M3) 
MM, mgt Hal?) — 7M > ua, 
aim M, M, | 
rr AS, 


The corresponding terms of the expressions involving E(a?c?) and E(b*c?) are definable 
by inspection, and (i) is now reducible to an expression involving the 2nd, 3rd and 4th zero 
moments of the A-score, B-score and C-score distributions. We may collect terms involving 
moments of the A-score distribution as follows : 


p.+M, ad ES. +3 A E 
| MG F MG a 1M aga T ME rM.g, pala) + 2M AM gti 2M m2 ta pala). 
Since (p + pe) = L — Pa = qa and M, + M. = (r — M q = 1g, the above reduces to 


rÑW1 2r+1 
Me pala) — TE r td) F Ma muda) 


We now recall the expressions for the 2nd, 3rd and 4th zero moments of the A-score distribution 
whose definitive binomial is (qa + Pay : 

pala) = Pada + 77g = Maga + Ma 

(a) = 1Paga + 37°10 — pala + 1°D5 

ada + 3Miga — 2M Pago + Mz; 
yl = Pala — Orpaga + Tr"Paga — Vr*paga + Sr*Paga + Y Pa 
= Maga — 6M paga + 7Miga — Miboga + Maga + Ma. 
14 
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Whence the terms of (i) involving moments of the A-score distributions are 


r+1 (r + DM, a 
Ma AN = (r + 1) + == 7 à 3 . (vi) 


=- (2 + 1) = Br+ i Pe = = 

uu ee (eit 
1 do De M? , 

mat = 14 : +7—Mpe+6M,+7% G 


The terms involving ue(b), us(b), palb), po(c), ualc), and p4(c) are derivable by inspection 
from the above, and = reduces to 
r+ 1 r+i 
H(C3) = AO + Myg Ma, to) + Ma Mató 


¿ED tn a2 +i oy E, b 


1 2 
ia et) a TEO $ O Ree ar ce) 


n (x) the sum of terms involving second moments is 


l 1 1 
ado + pg) + 5 o) 
<= 30 + 1) E (r a 1)Ma ee (7 <i DM, fad + DM. 
Ya do qe 
Le pyp EEM g) ret IR 
da do de 
oo] M o 
do 
The sum of terms involving third moments is 
= M M} M 
A T IM, ES M, F Me) — Alps + pro + Pa) | $ -+ = + =| 
a b C 
ES pd E e oe rl — Ga)? a E= 4 i el a 
r Ga do de 


In this expression 


1—q.)? (1-q,)? (1-—q.)? 1 A > aa A | 
bese pO O S 
Ta To Te Ja do de Ta do Ge 


Whence (xii) reduces to 


-5r (+4242) +o ; ; =- (Xin) 
E Ga do de 
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The terms age 4th moments reduce to 


1 1 ME. ME M 
M MTM AFIP HDP +M +M +M,) + dae ss — 
+. b M, 


12 | Zee eae ; 
Sa a E 2 Mos A el 
= 10 + 6r r 4r? +r A -F Ln (xiv) 
By substitution of (xi)-(xiv) in (x) we have 


1 1 
WO)=8—S (teeta) A 

Evidently, the expression on the right approaches the limit 8 in agreement with (11) if 7 is 
very large. In the same way, we may show that 


pCo) ~ 48 = f(f+2)(f+4) and lC) ~ 384 =f( f+ 2)(f+ 4) (f+) 


on the same assumption. However, it is equally evident that the moments of the statistic 
under discussion do not closely agree with those of the exact Chi-Square distribution unless 
the size of the sample is in fact very large. The closeness of the approximation and the sign of 
error depend not only on 7 but on the expected class proportions. 

The exact definition of moments of higher order than pe for the 3-class case introduces 
no new matter of principle. It will therefore suffice to cite the results, viz. : 


4(79r — 69 28r NE F 1 
p{C3) = 48 + Se ) 2 +> MÈ . . e . . (xvi) 
pa d ¿=1 
(6368r? — 18123r + 12150)  2(340r? — 1109r + 862) "3 1 
ee. ee eT, 
ES z) da | po | = 
+ E a M, + —— È We e IE ` i ; . (xvii) 


It is easy, if also laborious, to recognise the common pattern for Cy, Cy, etc. Thus more 
generally for a system of n = ( f + 1) classes the moments approach those of the Chi-Square 
variate for f degrees of freedom, e.g. 


n? == oa | 


a eC Oeil 
aa r(3n3 + 21n? + 24n — 26) — 2(n + 3)(n? + 6n — 4) 
LLC.) = (n — Din + Da 3) AAA 
i=n i=n - 
a r(3n + 19) — (3n + 22) 27 1 a =. EE 


The accompanying tables (1-3) illustrate the exactitude of ia for the 3-class 
case, O being there the sample value of the Chi-Square statistic and f the frequency of a sample 
of specified structure. Thus the column totals for fQ, fO?, fO*?, fQ* are the numerical values 
OÍ plz, Mo, Hs la» The reader may still ask: how close must be the correspondence between 
the moments of the statistic under consideration and those of the Chi-Square distribution of 
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15.02, if we are to use the latter legitimately as a fitting curve ? The question invites laborious 
and formidable computations. It admits of no complete answer to date, and is worthy of 
exploration with the aid of the newest electronic machines. 


TABLE 1 
Three-Class Universe (ba = 1, ba = $, Pe = 12 
MMe (BM (CMS 
Q = M, ? a ee M, 


(a) Unit Sample. 


Sample Structure Frequency 

A B C (f) Q 
— 1y2 ee: 5 en ee 

1 0 0 ba =t (1 en (23) + ( 13) 3 
4 3 <= 

: o SS Lo ENS 

dos y PE. CA 
4 3 2 
— +)3 — 1y2 1 = 

Ag ren or poe | GP Cb, CA 
4 3 


TABLE 2 
Three-Class Universe (ba = t, Pp = $, De = 12) 
A — M.a? (B — M,)y? C= MI? 
A B= My | | ) 


2 Ma M, M. 
(b) 2-fold Sample. 
Sample Structure 
A B C F Q 
0 A — 2)2 ES 
9 0 0 Da = 1% ( a ate ( = a ( a = 
2 3 6 
Es dl Taa E ESE 
1 1 0 2paPo =$ EA pi e + ( a =iE 
2 3 6 
138 AE i= 
2 3 6 
> E 9 — 2)2 US 
IA Iei CH @-) CA 
2 3 6 
— 1y2 li DA 1 = 5y 
o | 4 | 1 lean. | A 2 
2 6 
it oe ee ge yay a AN 
0 0 9 b? = ter ( 2) th ( 3) q 3 $) = 14 
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TABLE 3 
Three-Class Universe (Da = 1, Po = $, De = 1% 


ARI AA) (C- M) 
Q E Ma = M, 7 M, 
(c) 3-fold Sample. 
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| Sample Structure 


A B C f o 70 Jo? fQ? fo’ 
| 3 0 0 Pi =+ 9 3 a3 739 eses 
Sa 0 3pip, = as e | oe $ y5 sp 
oo app. = ds ee ete | encase 

1 1 1 6paPibe = BE is 36 270 7025 30,375 
Ei -À 0 Spapt = vs E a Pa 343 2404 
o 2 Bpap? = vos 2 ya fees an IE 

E s 0 45 aly 6 2 2 8 48 

0 2 1 3pip. = Es 2 1 de Bhs 732 
poo E 2 | 391% = Tas $ az 4 is 5 

o A AAA as Fa 

Totals 2 34 7227 soa Zs = 


CHAPTER 20 


SECOND: THOUGH TS-ON SIGNIFICANCE* 


20.00 STATISTICAL INFERENCE 


Much of the content of this volume deals with test procedure designed to assess the credentials 
of a unique null hypothesis. It would be unfitting to conclude it without reference to growing 
uneasiness with respect to the role of the unique null hypothesis in statistical reasoning. Indeed, 
it is a remarkable circumstance that our own generation has simultaneously experienced unpre- 
cedented eagerness to exploit new tests of significance and vigorous controversy on a wide front 
with respect to the credentials of statistical inference. On all sides we hear that statistical theory 
is the logic of the sciences ; but at least three divergent views about its rationale are current and 
have advocates of no mean intellectual stature. Meanwhile, it is not feasible to offer a definition 
of statistical inference acceptable to all mathematicians who concern themselves with the theory 
of probability or to all practical statisticians. Any such definition presupposes an answer to 
the age-old question: what is truth ? Any answer to the latter presupposes a personal credo 
embodying the relation of human knowledge to the external world. That of the writer is in the 
broadest sense of the term behaviourist. Accordingly, we shall here assume that (a) any recipes 
for arriving at truth (rules of inference) on the basis of inescapably imperfect acquaintance with the 
real world have as the end in view an unequivocal assertion coupled with an admission of liability 
to error; (b) what distinguishes the recipes we call statistical inference is that this admission 
—the uncertainty safeguard of the assertion—is numerically specifiable within an assumed frame- 
work of indefinitely protracted repetition. Thus the uncertainty safeguard is the probability 
of false statement. 

From a practical viewpoint, it is useful to distinguish sharply between two techniques 
of statistical inference, though we shall later seek for a formula embracing both : 


(a) test procedures, including the traditional null hypothesis significance tests, ostensibly 
devised to adjudicate on the merits of particular hypotheses ; 


(b) methods of estimation, the aim of which is to make legitimate statements about numerical 
characteristics of a universe or subuniverse on the basis of information supplied by a 
sample. 


Within the domain of test decisions, it is essential to distinguish between different targets 
of statistical inference : 


(1) to decide whether to regard a particular hypothesis as true or false ; 
(11) to limit the risk of rejecting it if it is indeed correct. 


*I am greatly indebted to Raymond Wrighton for many (to me) profitable discussions of issues raised in this 
chapter which incorporates the substance of joint papers (Hogben and Wrighton) on The Statistical Theory of 
Therapeutic and Prophylactic Trials in the British Journal of Social Medicine (1952). 

1 F. J. Anscombe (1951), Mind, Vol. 60, makes a three-fold distinction : 


“ It is worthwhile to distinguish different purposes one may have in accepting a hypothesis: (i) to base 
an administrative decision on, (ii) for further testing and confirmation, (iii) for acceptance into the corpus 
of scientific knowledge, to be relied on in future work. There are risks, variously assessable, in coming to 
decisions in all three cases. For example, in case (iii), if the hypothesis is later found to be seriously false a 
lot of effort in investigating other points may have been wasted. Just as with prior confidences, risks are rather 
vague in magnitude, but in a formal theory it would be tempting to postulate a complete numerical risk-function.”’ 
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Broadly speaking, this dichotomy tallies with a useful distinction between two types of 
statistical inference definable as follows : 


(i) unconditional, if the uncertainty safeguard specifies the unconditional probability of 
the falsity of the assertion itself ; 


(ii) conditional, if the uncertainty safeguard merely specifies the probability of rejecting the 
relevant hypothesis when it is true. 


In symbolic form we may express the unconditional probability of false assertion as 
P, = (1 — P,), and the conditional probability of false assertion within the framework of a 
particular hypothesis A as Pj. «a = (1 — Py. a). Thus an assertion of the form P; = «x is an 
example of unconditional statistical inference. Besides these two forms of statement we may make 
one of the form P,>w. Evidently the more exact assertion P, = 0-95 has no pragmatic priority 
over the less definite assertion P, >0-95; and we may prefer to regard a statement expressed 
in the form P, >x as an example of (i) if we deem (1 — x) to be an acceptable level of uncer- 
tainty. On the other hand, it serves no useful purpose to make an assertion of the form P, > 0-30 
if we regard any figure above 5 per cent. as an inacceptable level of uncertainty. Hence we 


shall have no practical interest in stating an inference of the form P, >x unless we should be 
content with the assertion P; = x. Otherwise any useful statement of statistical inference we 
may undertake conforms to (11). 


First Aid for Inequalities. In what follows we shall make more use than heretofore of inequalities 
referred to briefly on p. 10 of Vol. I. In higher school and elementary college courses one deals 
mainly with equations. One has therefore little experience of the importance of, or opportunity to get 
familiar with, statements involving the ideograms < or > and < (not greater than) or > (not less than). 
The student of statistical theory should be thoroughly familiar with their use. Here follows a short 
dictionary of meanings which the reader may interpret or test by substituting whole numbers for literal 
symbols. 


(i) m>x>kork <x <m means: x lies in the range k to m inclusive. 
(ii) m > x > kork <x <m means: » is greater than k and no greater than m. 
(iii) m > x > kor k <x < m means: x is less than m and not less than k. 


(iv) the two statements x > m and x < m constitute an exclusive binary classification of the range 
of values x may assume, as do also the two statements x > m and x < m. 


(v) any of the following statements constitute an exclusive three-fold split of the range in which 
x may lie: 7 
CEDITN DEAD E AE 


CL A o e my x << Mm 
(c)x>k; kR>x>m; x<m 
MERA: kow > m;, x <m 
(vi) The following rules of sign reversal are important : 
k—b<k—awhenb>a 
R=—b<Rk—a when b>a 


(vii) If we denote the probability that the score x is no greater than m by P(x < m) and probabilities 
referable to other statements about the interval in which x lies in accordance with the same pattern, 
certain important identities derive from the addition theorem in virtue of the above, e.g. 


P(x > m)+ P(x <m)= 1 
P(x >m)>1—a if Pu <m)<a 
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20.01 STATISTICAL INSPECTION 


In the foregoing discussion we have made a provisionally clear-cut distinction between estimation 
and test decision ; but we shall later see that the views of a growing and influential school of 
theoretical statistics make it possible to formulate a definition of statistical inference which 
exhibits the procedure of test decisions as a limiting case of the procedure of estimation. A 
new orientation, of which the theory is due to J. Neyman, E. S. Pearson and A. Wald, 
registers the impact on statistical theory of practices and metaphors which have developed over 
a long period under a shroud of trade secrecy within the research laboratories of large cor- 
porations, more especially the Bell Telephone Company. Much that is otherwise mysterious 
and highly abstract comes to life against the background of industrial practice. We shall 
therefore be better able to appreciate a fresh approach to the problem of significance, if we 
acquaint ourselves with some elements of the technique of statistical inspection (quality control) 
in commerce or industry. To do so some new terms will be necessary. On that account we 
must now digress. | 

At the most elementary (p-chart) level the aim of statistical inspection is to ensure that the 
production process is working to schedule. The assumption is that no machine is perfect in the 
sense that the output is of uniform excellence. All we can hope for is that it continues to deliver 
samples of products with a fixed and satisfactory mean value in accordance with a known law 
of error, e.g. normal or binomial. The scoring system may be representative (e.g. duration 
of life of an electric light bulb) or taxonomic (e.g. percentage of inactive ampoules). If the 
sample score lies outside a range deemed satisfactory (e.g. 3c level), the inspection system 
recommends to the management overhaul of the machinery to ascertain whether the result is 
a fluke. Otherwise we may speak of the process as being in statistical equilibrium (Fig. 127). 


THE UNIVERSE 
of daily output (Consignment ) 
ie. THE PRODUCTION PROCESS 
in STATISTICAL EQUILIBRIUM 
ot Acceptable Quality Level 


SUBUNIVERSE II 
Consignment 
Below ACCEPTABLE QUALITY LEVEL 
Above INTOLERANCE LEVEL 


SUBUNIVERSE IZ SUBUNIVERSE Y 
Consignment Consignment 
At AQL Above AQL 


SUBUNIVERSE I SUBUNIVERSE I 


Consignment Consignment 
Below LL. At IL. 


Fic. 127. Sampling in the Stratified Universe of a Production Process in statistical equilibrium. 


In what follows we assume that this is so. The mean daily output will then be up to standard. 
That some consignments will be below it (Fig. 128) is then fully consistent with the possibility 
that the process is working as well as may be. 

Thus the manufacturer or seller can at most guarantee that the product will very rarely 
fall short of a standard of precision called the acceptable quality level (a.q.1.), e.g. that the sectional 
area of two by two inch wooden battens will not be less than 3-9 sq. in. or that the proportion of 
inert ampoules of post-pituitary extract in a consignment will not exceed one per cent. Complete 
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THE UNIVERSE OF DAILY OUTPUT ` 


¡227 232UDJ23]07U| 


12437 AWJONO 3/qDIAIIIY 


A mama anna DANY: output above AQL n=. == =5, 


Percentage Defectives 


25 


125 


a- æ m m m l l ee A eee eee ee ee e SS mae we = 


SUBUNIVERSE IV 


Y 


SUBUNIVERSE 


SUBUNIVERSE 


ewe ww ww ew eT Å l Ow we ww ew ee 


Sample Distributions for subuniverses of Fig. 127. 


Fic. 128, 


14* 
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inspection to maintain this standard would be costly if practicable. Often it is impracticable, 
because testing the product (e.g. mean duration of life of an electric light bulb or activity of a 
glandular extract) involves destroying it. What we call quality control is therefore a sampling 
procedure. Evidently, a sample from a consignment above a.q.l. might be below it. Thus 
a 500-fold sample at random from a consignment of ampoules of which only 0-9 per cent. are 
defective might well contain 6 defectives (1-2 per cent.). To reject every consignment as 
below the 1 per cent. guaranteed a.q.l. if the sample contained more than 1 per cent. defectives 


Consignment above 
AQL. 
p='20 


ACCEPTABLE QUALITY LEVEL 
25% defectives 


= Risk of Rejecting 


ns (| - . un e de 


onsignment at Acceptable 
AQL. Consignment 
p= “25 


20 25 375 50 


Consignment at 
LL. 
p=50 


50 60 


N 
u 


INTOLERANCE. LEVEL 37- 
50% defectives 


EE Risk of Accepting 


Intolerable 
Consignment 


sarao nop -l 


Consignment below 
LL. 
p=-60 


375 50 60 


100 fold samples 
Rejection criterion x 238 
Acceptance criterion X< 37 


Fic. 129. The risk of rejecting a consignment above Acceptable Quality Level is less than the risk of rejecting a 
consignment at a.q.l. and the risk of accepting a consignment below Intolerance Level is less than the risk of 
accepting it at i.l. 
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would therefore be wasteful. So one aim of quality control is to ensure that the producer will 
rarely discard a consignment, if it is at or above a.q.l. 

Given the law of distribution, e.g. a normally distributed sample mean or a binomial dis- 
tribution of proportionate defectives, the producer or seller may decide to reject consignments 
on the basis of sample structure in such a way as to exclude in the long run only « per cent. 
(e.g. 5 per cent.) of those at exactly a.q.l. We call this the producer’s risk. ‘This procedure, 
which ensures a risk less than « per cent. of rejecting consignments above a.q.l., is consistent 
with endorsing some consignments which do mot in fact satisfy the a.q.l. guaranteed to the 
consumer. To safeguard the confidence of the latter, the seller or producer must ensure a low 
risk of releasing consignments too defective to be tolerable. This presupposes some standard 
we shall here call the level of intolerance (i.1.) and a criterion of acceptance comparable with the 
criterion of rejection, i.e. acceptance on condition that the risk of release at i.l. is only B per 
cent. (e.g. 5 per cent.). We call this consumer's risk.* The risk of rejecting a consignment 
above a.q.l. is less than producer’s risk as already defined, i.e. risk of rejection at a.q.l. 
The risk of accepting a consignment below i.l. is likewise less than the so-called consumer's 
risk, i.e. risk of rejecting at i.l. (Fig. 129). 

To give a more precise meaning to these terms it is essential to be clear that they refer 
in this context to opposite tails of a sample distribution. Accordingly, we must first recall 
(Vol. I, Chapter 5) the distinction between vector and modular assessment of risk. ‘Thus the 
modular risk that a sample value of a normally distributed variate will deviate from the true 
mean by more than + 1-960 is 5 per cent. ; but the 5 per cent. level is at — 1-64ø for the risk 
that a score will fall short of the mean and at + 1-64o for the risk that it will exceed the mean 
by so much. A fictitious example will clarify the issue. We shall suppose that : 


(a) the producer sets his level of acceptability for a consignment of battens at a mean figure 
M, = 20 mm. in thickness and the level of intolerance at a mean figure M, = 18 mm. 
in thickness ; 


(b) under ascertained working conditions of the sawmill the standard error of a 100-fold 
sample mean is 0-75 mm. with an approximately normal distribution. 
If inspection shows that the sample mean of a 100-fold sample of a particular load is 
x = 18-75, the equivalent standard scores will be: 


(a) if from a consignment at a.q.l. 


18-75 — 20 
a) aa 1-66 ; 
(b) if from a consignment at 1.1. 
07 — 18 1 
0-75 


The decision to accept only samples above 18-75 would thus involve a producer’s risk at the 
1-660 level and a consumer's risk at the lo level. From the table of the normal we find 
that the area up to — 1-660 is 0-049 (nearly 5 per cent.) and the area beyond + ø is 0-159 (nearly 
16 per cent.). If the true mean were above the a.q.l. the corresponding standard score would 
be numerically greater than — 1:66 and of the same sign. If below i.l. it would be numerically 
greater than + 1 and of the same sign. If our criterion of acceptability is 18-75 for the 100-fold 


* The expression consumer’s risk has a taint of uplift and is somewhat misleading on that account. It suggests 
that the primary end in view is to look after the interests of the consumer. It is more precise to regard the end 
in view as that of limiting the producer’s risk of losing the consumer’s goodwill. 
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sample mean, the risk of rejecting a consignment at or above a.q.l. is thus less than 5 per cent. 
and the risk of accepting a consignment at or below i.l. is 16 per cent. or less. 

Suppose now that we equalise the two risks, i.e. choose a score criterion (x) which makes 
the two standard scores numerically equal with opposite signs, so that 


x—18 Dy 
CB 07 


Thus x = 19. In this event, the numerical value of the standard score is 1-3 and the table of 
the normal integral gives the area of the excluded tails as 0-092 or about 9 per cent. risk for 
both consumer and producer. By making the size (r) of sample larger we can, of course, make 
the variance of the sample mean smaller and the standard score itself larger. The s.d. of the 
two distributions will be in the ratio r-*: 100-*, In any case, the two risks will be equal if 
x = 19. If we want to keep the two risks at 5 per cent. or 1-640 level we therefore have : 


Vr aa (ea Vr 
10* 0735 ~ -pno ae 
pow 151. 


As an alternative illustration we may (again fictitiously) suppose that the pharmacist sets 
an a.q.l. for the proportion of below-standard ampoules of a preparation at 25 per cent. and 
an intolerance level of 50 per cent. For small samples (under 50) the normal approximation 
will be poor and for even larger samples the half interval correction (p. 116) will make a big 
difference to our assessment of per cent. risk. For illustrative purposes, we may therefore 
content ourselves with defining the risk in terms of the critical ratio (A). First suppose that 
r = 27 (Fig. 130). We shall denote by x the number of defectives in the 27-fold sample and 
define x, so that we: (a) reject a consignment if x > x,; (b) accept a consignment if x < x,. 
We now express x, in terms of the true mean and the sample s.d. If p, = H(a.q.1. 25 per cent.) : 


hv(81) _ 27+ 9h 
r-a 


Xe = 1p + hV rpg, = 2 


If pa = 4 (i.l. 50 per cent.), we have likewise : 


27 hv27  27—3hV3 
A 
Whence we have 
27 + 9h 54 — 6h4/3 
LoS 4 : 
In this case h is numerically a little less than 1-4 and x, œ9-8, i.e. we should reject consignments 
if the 27-fold sample contained 10 or more defectives and accept them if they contained 9 or less. 
Now we may wish to make our risk smaller, let us say at 2c level. If so, we reject when 


27 2V8] 


— + —— Le, ll E 
as wad r Le: $ or over 


We should accept if 3 
x<223V3 ie. x = 8orless. 


This would leave us with consignments about which we could make no decision to cover both 
risks on equal terms (at the 2c level), i.e. if the 27-fold sample yields 9, 10 or 11 defectives. We 
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INCOMPLETE INSPECTION PLAN TO SAFEGUARD PRODUCERS RISK AND CONSUMERS RISK 


he Producers risk 
WW consumers risk 
Re-inspect 


A. Distribution of Defectives in a 
27-fold sample if from a consign— 
ment at 75% Acceptable Quality 

Level (p= 4) 


Mean 6:75 


accept 8 or less es Reject 12 or over 


B. Distribution of Detectives in a 27-told 
- sample if from a consignment at 50% 
intolerance Level (p=) 


à 


Fic. 130. Sampling on the basis of an incompletely decisive rejection-acceptance criterion. Here p is the proportion 
of defectives. The values p = 0-25 (75 per cent. up to standard) and p = 0-50 (only 50 per cent. 
up to standard) respectively define a.q.l. and i.l. 


could, of course, retest by taking another 27-fold sample ; but the outcome might also be incon- 
clusive. So 1t is more economical to cut our cloth to the standard set by deciding in advance 
what size (r) of sample will make Ak = 2 (or other prescribed risk criterion) when 


1p, + ho, = x, = rp, — hos. 


When h = 2 we find that r ~ 56, in which case we should reject consignments if the 56-fold 
sample yielded 21 or more defectives, and pass them if it yielded 20 or less. 

At this point, it is important to realise that we cannot have the best of both worlds by adding 
the result obtained from a first and inconclusive trial based on a 27-fold sample to that of a 
second sample of 29 in order tc make up a 56-fold trial which must give a conclusive result. 
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This procedure signifies that we do not give a subsequent 29-fold sample the opportunity to 
pair off with a 27-fold sample unless the latter contains 9-11 defectives. Thus only 3 out of 
28 classes of 27-fold sample would go to the making of the sample of 56; and we can make up 
a sample of 56 in accordance with the assumption of random choice by adding samples of 27 
and 29 only if every possible sample of 27 has an equal chance of association with every possible 
sample of 29. ‘This raises the question : is it necessary to prescribe in advance a sample large 
enough to ensure a conclusive result? In other words, can we devise an admissible inspec- 
tion scheme which we can terminate so soon as the result is decisive ? We shall return to this 
issue at a later stage. 

Here it will be well to emphasise that the foregoing outline of the inspection problem 
is highly simplified for expository use in connexion with the main theme of this chapter. We 
have used highly fictitious examples for arithmetical simplicity. We have assumed: (a) as 
will usually be true, the critical score x, is not an integer, so that all scores obtainable lie on 
one side of it or the other; (b) since the sample is a small fraction of a large consignment, we 
can dispense with the replacement condition ; (¢) we know the true variance of the sample 
distribution when we score by the representative method and have good enough reason for 
regarding the normal approximation as adequate. Actually, a normal approximation will not 
be a good one when the method of scoring is taxonomic, as in the last example. Commonly, 
- the admissible proportion of defectives will be very much smaller than 0-25 or 0-5 and a Poisson 
distribution might then give us a better picture of what we are doing. 

Within the same framework of limitations let us now look at our last illustration from a 
different point of view. We have chosen to adopt as our criterion of rejection for the 56-fold 
sample x > 22 and as our criterion of acceptance x <21. This ensures a risk of approximately 
5 per cent. that we shall reject consignments at a.q.l. (p = 0-25) and an equivalent risk that 
we shall accept consignments at i.l. ( = 0:5), i.e. we shall accept 95 per cent. or more con- 
signments if at or above a.q.l. and reject 95 per cent. or more if at or below i.l. It goes 
without saying that the quality of some consignments will neither be up to a.q.l. or as low as 
il. For instance, the true proportion of defectives in the consignment might be 0-38, in which 
event the 56-fold sample mean would lie between 21 and 22. Our rejection-acceptance criterion 
would then ensure about 50 per cent. risk of rejecting and the same risk of accepting such a 
consignment. 

We speak of the inspection plan as complete if it always leads to a decision at both a pre- 
scribed producer’s risk («) and a prescribed consumer’s risk (8). We then have a model of what 
Wald calls a decision, in contradistinction to a significance, test. Formally we may describe the 
plan in terms of a rule to reject one or other alternative hypothesis : hypothesis A that the con- 
signment is at or above acceptable quality level, i.e. M > M, ; hypothesis B that the consignment 
is at or below intolerance level, i.e. W < M,. The test, i.e. the inspection plan itself, is the rule: 
reject the consignment only if the sample score x < x,. In effect, we therefore say: reject 
hypothesis A if x < x., and reject hypothesis B if x > x, Our decision is verbally equivalent 
either to denying that the consignment is up to a.q.1. or to denying that the consignment is at or 
below i.l. Neither decision implies the denial of the possibility that the consignment mean (M) 
lies between the two levels (M, < M < M,). 

We chose x, so that a is the probability of rejecting the consignment at a.q.1. (M = M,) 
and $ is that of accepting one at i.l. (W = M,). Since the probability of rejecting a consignment 
above a.q.1. will be less than a, and that of accepting one below i.l. will be less than £, we may 
say that the risk of rejecting hypothesis A (M < M,) when it is true is P,. , < « and the risk 
of rejecting hypothesis B (M < M,) when it is true is P,. < £. To choose x,, our rejection 
score criterion, so that the risks « and £ are themselves acceptable (e.g. « = 0-05 = £) we must 
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prescribe the size of the sample in advance. Our inspection plan which guarantees a decision to 
act in one way or the other is thus a test which guarantees minimum risk of wrongly rejecting 
one or other prescribed hypothesis, if we consistently follow the same rejection rule for samples of 
the same size. How sucha prescription of test procedure differs from that of Yule and of Fisher 
will be the theme of comment at a later stage. 


20.02 BAYES’ THEOREM AND THE SEQUENTIAL RATIO 


In Chapter 5 of Vol. I we have exhibited a much-discussed theorem of unconditional statistical 
inference, that of Thomas Bayes (1763), as a balance sheet which sets out what information 
we require in order to specify the long-run frequencies with which we shall arbitrate correctly on 
the assumption that one or other of an exhaustive set of hypotheses is correct. We conceive each 
hypothesis as the assertion that a sample of specified composition comes from a sub-universe 
also of specified composition, leaving open the possibility that the composition of different sub- 
universes may be identical. Thus the sub-universes may be urns containing coloured balls. 
We shall then speak of them as urns of the same type if they contain balls of the same colour 
in the same proportions. This definition suffices if we sample with replacement. If we sample 
without replacement, we must make the additional assumption that urns of the same type contain 
the same total number of balls. The most general specification of the sample taken from one 
or other urn will, of course, be the term of a multinomial expansion ; but it will here suffice 
to specify the sample on the assumption that each sub-universe is of the 2-class kind, as when 
we distinguish balls by colour as red and other. For simplicity, we shall also assume (unless 
otherwise stated) that sampling is with replacement. 

As an example of the stratified universe of Bayes we may envisage a set of 10 urns constituted 
as follows : 


No. of Urns Proportion of Red balls 
Type I : ; : è : 3 2 
Type ll . 5 ‘ ; ‘ 5 $ 
Type HL à , » ‘ é 2 3 


Of this set-up we may initially define 3 parameters (P, = 35, Pa = Ly, Ps = Yo) respectively 
specifying the proportionate frequency with which we take a sample at random from any one 
type of urn; but the frequency with which samples of a particular composition will occur in 
any one of the three classes of samples so specified will depend on the parameters (bı = 8, pa = $, 
pa = 3) which specify the proportion of red balls in the urn definitive of the class. Within 
the framework of the illustrative (but not necessary) assumption that we sample with replacement, 
the long run proportionate frequencies with which x red balls will occur respectively in an 
r-fold sample from one or other type of urn are: 

a 1 3” 
gr? Po. =Y0) or? Po.3 = Y) ar 


Within the framework of the 2-class universe and a 2-stage sampling process, Bayes’ 
Theorem is an exact answer to the question: what is the probability of correctly asserting that 
an r-fold sample comes from a sub-universe of type M if it contains x items of a class A (e.g. 
of red colour in this example)? If we speak of such an assertion as the adoption of hypothesis 
M, we may also phrase the issue as: what is the long-run proportionate frequency of correct 
action based on the assumption that hypothesis M is true? The answer follows from the 
product and addition rules, as we shall see below. 
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It is more easy to appreciate its rationale, if we first set out the result numerically as a balance 
sheet of long-run frequencies. Let us assume that 7 = 4 and x = 2, i.e. that our sample consists 
of 4 balls of which 2 are red, so that 


ak eS RE 
A Pg E 


cofes 
sd 
co 

|l 
fal 
bo 


Proportion of all 4-fold Samples 


With 2 red balls Other Total 
From Type I $ >. y = i 2. DW E 
From Type II > ; 4,8 =- a 2 
From Type III . : 4.35 = eL 4491 — 101 1 
Toil co iia | illest wea aes «ve ae Yee ee 


$ 


If we now abstract from this table the items which refer to samples with the structure specified 
we derive the following proportion of 4-fold samples with 2 red balls : 


A E EE | 
Type I 135 — 77,280 ~ 1835 
E OOS .. Shee 
Type II 16 > 17,280 SS 
E A 5505 _ 32343. 
Type III 340 * 17,180 1882 
Total 


e 


We may set out the universe of sampling in more general terms as in Table 1 and the universe 
of samples as in Table 2. Table 3 then shows the balance sheet of long-run frequencies in the 
same terms as above. For a formal statement of the theorem we may employ the following 
symbols : 


ba = (1 — qa) = proportionate frequency of a successful draw in a unit trial from sub- 
universe H. 


P, = proportionate frequency that an r-fold sample comes from sub-universe H. 


P, . » = conditional proportionate frequency that x is the score (successes) in the r-fold sample 
if it comes from sub-universe H, so that 


with replacement : Poin =VmPiG 73 


without replacement (from sub- 
universe of n items) : Pon = hee oe a 


Pps = proportionate frequency of the combined event that x is the sample score and that the 
sample comes from sub-universe H. 


Then by the product rule: 
Pro = Past ee 


P, = proportionate frequency of the event that the score of any r-fold sample is x. 
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Then by the addition rule: 


h = © h = 00 
P, = > Pre = Pl Piet 


h=1 h=1 


P, . « = proportionate frequency of the conditional event that the r-fold sample comes from 
sub-universe H if x is the sample score. 


Then by the product rule: | 
E A 
Whence we obtain the theorem : 

Fac oe F, . es -h 


— h= oo 
Ps PS 
int 


TABLE 1 
A Bayes’ Model of a Stratified Universe of Sampling 


Types of Urn I II III Total 
Nos. of each type ny No Ns N 
Proportionate contribution poe Ean Pra Ns 1 
ditto to all samples ETAN EN EN 
Ee mon of red balls in Pi de $e 


F f r-fold 1 has 
E eed tale” ES [Py 2 = rafal A E 


TABLE 2 
The Bayes’ Model for r-fold samples 


E With more or less 
From Urn With x red balls sa redobla Total 


I Pye D ey Pit Pr P, 
| 
II Pa. Pz. Pal — Pz.) P, 
IMI P}. Pz. LP Pea En 
h=3 
Total | P= Fi Pas 1—P, 1 
h=1 


<< 


* If sampling is without replacement we write for an urn containing Um balls of which Sm are red 


E (ax) — . (r) 
Ln = Y a)Sm (Um ES Sa) 2 “— Um. 
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TABLE 3 
Proportionate Frequency of the event that the r-fold sample contains x red balls 


Does not come from 


Comes from 


Fe 1 — Pi.» 1 


l— Pi. y 


In Table 3 P,., is the proportionate frequency of the event that an r-fold sample with 
score x comes from an urn of type H, i.e. the long-run frequency, among all samples so constituted, 
of those which come from such an urn. To assert that an r-fold sample does so on the basis 
of the additional information that the sample score is x, is to assert that hypothesis H is true ; 
and P,., is the proportion of such assertions which correctly describe what happens in the 
long run. Thus we may re-interpret (Table 4) the items of Table 3 as probabilities of the truth 
or falsehood of the assertion that a particular hypothesis is correct. Any items of the form 
(1 —P, ,) in the column headed False thus correspond to what we have called in 20.00 the 
uncertainty safeguard of the unconditional assertion that hypothesis H is true. 


TABLE 4 


Long-run frequency of Statements about the r-fold sample containing x red balls 


From Urn True False Total 
Pots 
I ee ara E Li Pigs 1 
| II Py p= ie Ps l | 
III Pae = ee 1 — Pi. 1 


Before proceeding further it may be helpful to some readers if we first pause to dispose 
of a common difficulty. The logic of Bayes’ theorem is not obscure or subtle against the back- 
ground of an urn model ; but appreciation of the relevance of the model to statistical inference 
in the domain of the world’s work makes no mean demands upon the imagination.* A biological 
illustration may assist the reader who boggles at this step. We shall suppose that a culture 
of fruit flies contains 100 females of which 5 carry a sex-linked lethal gene. Concerning one 
of these flies we know that 2 of its progeny are female. Now this will occasionally happen 
if it is normal, but much less rarely if it carries a sex-linked lethal gene. If we merely know 


* To add to the difficulties of the plain man, current and authoritative works repeat such paradoxical definitions 
as that the prior probability of the hypothesis is the probability assigned thereby to the “ event before it has 
happened ”. Actually, our prior and posterior probabilities refer to different events, one to the probability that 
any r-fold sample comes from a sub-universe of type H and one to the probability that a particular sub-class of 
such samples (i.e. those with score x in our model set-up) comes from it. 
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that an individual is a female we know that there is a 95 per cent. chance that she is normal. 
This is the prior probability of the null hypothesis that the female is normal. The null hypothesis 
assigns as the probability that any single offspring of a normal fly will be female pa = 4. The 
alternative hypothesis that she carries a sex-linked lethal gene assigns as the probability that any 
single offspring will be female p, = 3. If our illustrative mother fly has 100 offspring of which 
three-fifths (i.e. 60) are female, the frequencies of correct judgments based on the assumption 


of normality or otherwise will be in the ratio 


99 -Mom 5 100! 260 
100° 60!40! ` 100 ` 60! 40! 3100" 


This is approximately 6-75: 1 in favour of the null hypothesis. If we estimate the relative 
frequencies of correct decisions on the false assumption that there are just as many female flies 
of both sorts in the culture, we should obtain the figure 0-36 : 1 or about 3 : 1 against the null 
hypothesis. Such an assumption, known as Bayes’ postulate, is on all fours with a common mis- 
conception implicit in the lay-out of age-case distributions in extant medical textbooks. Peptic 
ulcer is a much more common complaint after 40 than before that age ; but it would be a fallacy 
to assert that a conscript is over 40 because he has peptic ulcer. The proportion of conscripts 
over forty is very much less than that of younger men. Hence the actual number of younger men 
with peptic ulcer may well be greater than the actual number of men over 40 years of age. If 
so, there are more conscripts with peptic ulcer of whom one can correctly assert that they are 
not yet 40 years old. 

In short, Bayes’ postulate is the vulgar error of neglecting the population at risk. The 
true prior probabilities which it gratuitously equalises are in this context the age-standard- 
ising weights which make the balance sheet of risk a true bill. Though we may not 
be able to attach an exact figure to them, we may be able to set some agreed limit on their 
relative values. In any case, the circumstance that we cannot do so with assurance constitutes 
no justification for assuming equality. ‘The fact is that one undertakes an experiment to test 
a hypothesis either because one has good reason to believe in its truth or because one has good 
reason to suspect its falsity.* Good investigators do not commonly undertake experiments 
unless they have one or other end in view, i.e. unless there is factual basis for the belief that 
the prior probabilities are unequal. 

Much misunderstanding arises through speaking of the prior probability of a hypothesis 
when we cannot indeed distinguish between a hypothesis which specifies a parameter definitive 
of an existent population at risk and a hypothesis which specifies a parameter definitive of one 
which conceivably might exist. ‘To be sure we can say that its prior probability is zero if the 
population specified by it is non-existent ; but the distinction, if formally trivial in this sense, 
is useful in another. Sometimes, as in the Model I situation of 20.06 below, we may postulate 
a sampling process which involves only one level of choice. By definition, we may then assign 
unity as prior probability to the correct hypothesis and zero as that of any conceptual alternative. 
In the general model situation—Model II of 20.08—to which Bayes’ balance sheet is relevant 
we conceive a 2-stage sampling process, the first being the choice of the urn or of the individual 
fruitfly in our previous examples. ‘The impossibility of assigning in most real situations appro- 
priate numerical values to the prior probabilities is thus only one horn of the dilemma with 


* F, J. Anscombe (1951), Mind, Vol. 60, rightly comments as follows : 


“As soon as any proposition or hypothesis has been formulated which is worth testing experimentally, 
there is already evidence as to its truth derived from existing accepted knowledge and from considerations . 
of analogy or ‘ consilience ?. A question to which we have no grounds whatever for hazarding an answer is 
an idle question and would not be the subject of scientific investigation.” 
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which Bayes’ theorem confronts us. The other is that it is often difficult to decide which model 
is appropriate to the real situation. Remarkable recent advances in the theory of test procedure 
and of estimation dealt with below have come about by formulating decision criteria in the 
derivation of which the prior probabilities cancel out. Neither our ignorance of their true values 
nor of their effective relevance then prevents us from assigning a firm uncertainty safeguard 
to an unconditional assertion. 
TABLE 5 
Relative Truth Table 
(Bayes’ Model of Tables 1-4) 


Relative Truth Ratio 


Urns 


Eb. _ Pi. pi = fie 
Pia Pi Bl OS 


Pros PARAS OS 
Fens Pe 2,0 — oe 


Piss PoP = arn 
PA 


I and II Bis = 


I and III B, = 


II and III Ba = 


Table 5 embodies the information of Table 4 in a different but sufficiently explicit way. We 
may speak of the fractions designated B,., etc., in our relative truth table as the Bayes’ ratio 
for alternative hypotheses I and II, etc. Now we can dissect B,,, etc., into two components 
if we write 


ee i(1 — p)” $ 
ka == ad ee i i c 
12 ro 12. pz(1 — p)* ( ) 


In more general terms we may then write for r-fold samples containing x red balls 
Bi. p == ki; e Sis. r . . . . . . (11) 


The expressions in the numerator and denominator of (111) have a special meaning which permits 
us to interpret Bayes’ ratio in a new way. Whereas P,., etc. defined above is the probability 
assigned by a particular hypothesis (4) to a sample specified by the score x, the expression 
p(l — pa) * has a more restricted meaning which is clear if we consider the possible ways 
in which we can score 2 heads and 2 tails in a 4-fold toss of an unbiased penny, wz. : 


HHT F TT uS 
HT TH TSS 
e TH IN 


The probability of each such sequence consistent with the 4-fold sample score x = 2 is (4)4, 
and the term 7,,,) = 6 in the expression for the probability that the sample score is in fact 2, 
viz. : 6(4)4, specifies the number of different permutations of 4 items, 2 alike of one sort and 2 
alike of the other. In short, the ratio S,; of (ii) above is the ratio of the probabilities assigned 
by 2 hypotheses to the occurrence of a particular score sequence. On this account we may speak 
of it as a Sequential Ratio. 

The fact last stated gives the sequential ratio a special interest vzs-d-vis the problem of 
economical inspection as stated above (p. 842). At each successive unit trial, it assumes a new 
value which cannot oscillate outside fixed boundaries and must approach more and more closely 
to a fixed limit. To see this property in action let us consider a numerical example. We 
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postulate : (i) two types of urn, A containing red and black balls in the ratio 1 : 1 and B containing 
red and black balls in the ratio 2:1; (ii) sampling with replacement, the score x being the 
number of red balls in the r-fold sample. ‘Thus our hypotheses are 


Hypothesis A ee 
Hypothesis B = 
a 


= ayy > OF A — ey) 


wow tele 


Whence we have 


From (ii) we can proceed to tabulate results of score sequences involving different total 
scores in an 8-fold sample as below : 


Score (x) Sequential Ratio (S) Score (x) Sequential Ratio (S) 
0 SEE 5 ace = 0801 
1 s581 — 19:8 6 ar = 0400 
2 $561 — 6-4] 7 32908 =0:200 
3 $581 — 3-20 8 es5s6 = 0100 
4 283) = 1-60 


From inspection of the above we see that a critical score level x, = 4:5 divides all 8-fold 
samples into two sets. If x <x,, Sab.s > 1, i.e. hypothesis A assigns a higher probability 
to the observed sequence than does hypothesis B. If x > Xe, Saw, ¿ < l and the converse 
is true. Without making any claims which sidestep the Bayes’ dilemma, we may choose to be 
content with: (1) accepting hypothesis A if it assigns a probability nine times as great as does 
hypothesis B to the observed sequence, i.e. Sab. >9; (ii) accepting hypothesis B if the 
converse is true, i.e. Sa»s.g <0-1. If so, we may define 3 score levels as follows : 


a < TA content to accept A. 
Lo =< ¢=e 70 no decision 
a> ro content to accept B. 


Let us now suppose that our x score is in fact 2 at the Sth trial. If so, we suspend judgment. 
At the next (9th) trial the total score must be 2 or 3; and 


a 3 
San ==3 g OI Top 


19683 19683. 
2048 ” 4096 
On the assumption that we accept hypothesis 4 when S¿» y > 9 we shall thus reach a decision 
at the 9th trial if we then score a failure (extract a black ball) and suspend judgment if we then 
score a success. At each trial we may in fact suspend judgment without bias to subsequent 
decision. | 

If we now return to (iii) we can give a meaning to our test criterion in terms of Bayes’ prior 
probabilities. To say that we shall be more often right than wrong if we act on the assumption 
that hypothesis A is true than if we act on the assumption that the alternative hypothesis is 
true is equivalent to writing B,,., > 1, whence 


"e Ses 


1 
Ror e z. or P, >> e. : : : Ls (v) 
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If we make our criterion of acceptance of hypothesis A that S,,., >9as above, we there- 
fore mean that we should be right in accepting it more often than in accepting hypothesis B 
so long as the prior probability of the latter does not exceed 9 times that of the former. If 
we make our criterion for rejecting hypothesis A (accepting the alternative) that Sas., >0-1 
we signify that we shall be more often right than wrong in doing so unless hypothesis A has a 
prior probability more than 9 times that of hypothesis B. 

The use of the sequential ratio so prescribed is different from other test procedures because 
it permits us to proceed to a decisive and unbiased verdict without prescribing in advance how 
large a sample will be necessary. This, of course, presupposes a positive answer to the question : 
can we guarantee that the outcome will eventually be decisive if we make the sanp size (r) 
sufficiently large ? 

Before we attempt to answer this question, it will be instructive to formulate more precisely 
a conclusion already stated: if S, and S,,, are sequential ratios for the rth and the (r + 1)th 
sample respectively, within what limits does S,,, lie? Let us write p, = mpa, whence 


Pill — pa)" ” 
m” .p*(1 — mp,)*’ 


‘ e 1 1 a do E 
s Be (E n i ; i ; (vi) 


n e 


Only two cases may arise when we enlarge the sample from r to r + 1 items: (i) x may remain 
fixed if the result of the further trial is a failure ; (ii) x may increase by unity if the result is a 
success. Thus we may put 


= 1 1—p, o 1 ( 1 — pa Hie = 
Sr = vel Of mz mb. (vil) 


a h r o o a 


po 3 (1 pa MPa) m 


Thus we have 


In particular, when pa = 4 we have 


1 
OO my 
Sag 3 or : 
Aa O AAA 
Bao l 1 = 
cet oer Eo i i ; (ix) 
Since pẹ <1, m < p; in (v) and m <2 in (vi). When p, = Ẹ and pa = 4, m = $ ; and (ix) 
becomes 
Sr+1 3 3 
S = 2 or 4 . . . . . (x) 


As stated, a sequential test would be of little value if we had no assurance that it would 
eventually terminate, 1.e. that the sequential ratio will attain an upper limit A assigned as 
our criterion that hypothesis A is acceptable and a lower limit B assigned as our criterion that 
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hypothesis B is acceptable. For simplicity we may adopt the convention that m > 1 in (v1) 
et seq., 1.€. Pg < Pp, and examine the consequences of the empirical rule of thumb commonly 
called the law of the constancy of great numbers, viz. that x approaches its expected mean value 
rp more closely as we make r larger. ‘Thus we may say: 


(i) if hypothesis A is true, x eventually approaches indefinitely near the limit x = rp, ; 
(ii) if hypothesis B is true, x eventually approaches indefinitely near the limit 
x= TPo = MIP q. 


Let us therefore examine what values S,, S,, respectively the sequential ratio takes when hypo- 
thesis A and hypothesis B are true, and r is very large in this sense. 


For the arithmetical example already cited p,= 4, and m = $, so that 


(=>). 1 Sy 


> er 


The two limiting values of x are 47 and $7, so that 
q 00 so o (Oy 
s,=4=(3) and s=% (5) 
92 93 


When r is indefinitely large S, itself approaches infinity and S, approaches zero. It will suffice 


to formalise this for a particular case of special interest, viz. pa = $, 1 > Pe > Pa so that 
b= m <2. We then have 


es dr 
si) TE (xi) 


` (2— mf (2m — m*)* 


m 
2 ape 1 (2 — ao : 
Sy E (=>) . (2 — my == pers es . . i (xii) 


Since m lies inside the limits 1 and 2, (2m — m?) in (xi) lies inside the limits 0 and 1, so 
that S, in (xi) is indefinitely large when 7 is also. The meaning of (xii) is more clear, if we 


write it as 
1 ds — m 4(mr— 2r) 
TE 
m 


Since m > 1, the first factor in the above becomes smaller and smaller when r becomes larger. 
That this is also true of the second factor is evident, if we write m = 1 + hin which h is positive 
and less than unity if p, <1. Thus we have 


Evidently therefore S, approaches zero as its limiting value. 

If then the hypotheses p, = 4 and p, = 3m constitute an exclusive and exhaustive set in 
the sense that one is true if the other is false and vice versa, the foregoing reasoning leads to 
the conclusion that the test will terminate in the rejection of the false and the acceptance of 
the true one if we make r sufficiently large. This does not imply that it will do so if a third 
hypothesis is admissible. 
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A single example will suffice to make this clear. Let us again suppose that we set out to 
test the alternative ba Pa =} and p, =$. This time, however, we shall admit me 
possibility that p, = # is the true Ti of the Abano parameter p, so that x tends to ¿r 
in the limit. Thus the limiting value of the sequential ratio will be 


A 
S \956/ * 
os 


This ratio becomes smaller and smaller as r aoe larger. Hence the test will eventually 
lead wrongly to acceptance of the hypothesis p, = $ in preference to pe =$. Alternatively, 
we may admit the possibility pa = $4 as the true eae of p. Whence are act the limiting 


sequential ratio as 
a 


320y 20 E 
(=) ~ (1-63). 


This ratio becomes larger as 7 eres Hence the test de eventually lead wrongly to the 
acceptance of the hypothesis p, = 4 in preference to p, = % if the true value of p is 0-59. 

The exact boundary is definable in terms which the sedes can generalise. For the par- 
ticular case when our sequential ratio is referable to the hypotheses p, = 4 and p, = $, we 
shall postulate a true value p, = $k so that x tends to the limit $kr and S to the limit 


3” T 
mam = S = (255) 


The test will not terminate if S, = 1, i.e. if 2*+? == 9, and 
log 9 = (k + 2) log 2, 


rls 


log 9 117 
=) ee ee 2 ~ . 
log 2 100 
If we set pa = 4 and p, = 3, the sequential ratio will diminish as r increases if p >44% and 


increases as 7 increases if p < 444 as the two examples last cited show. The reader should not 
find it difficult to generalise this result, the implications of which are clear. The interpretation 
of the test procedure against the background of the Bayes’ ratio presupposes that each alternative 
hypothesis has a finite prior probability. If both hypotheses are wrong and both are inad- 
missible, the test will terminate in a wrong decision. This raises the question : can we formulate 
the sequential test procedure in terms of alternative hypotheses which cannot both be wrong ? 

We have already seen that we can interpret an inspection plan in terms of the alternative 
hypotheses p < p, and p > pẹ. These do not constitute an exclusive set, since it is possible 
that pa < p < p» ; but we can agree to confine the verdict of the test procedure to the denial of 
one or the other, i.e. to alternative statements to the effect p > pa or p < pe. Neither statement 
is then inconsistent with the possibility last stated. ‘Thus a test procedure will result in denying 
one of the propositions p = 4 and p = 2 if designed to ensure the negation of one or other of the 
hypotheses p < 4 and p > 2; but we have not as yet shown how it is possible to interpret 
sequential ratio limits unless the alternatives assume the exact form p = pa or p= Pp. This 
will be the theme of 20.05 below. The advantage of doing so, if possible, is that we can continue 
to sample until we have reached a decision without prescribing in advance what size of sample 
will necessarily ensure a conclusive outcome. 
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20.03 LIMITATIONS OF THE UNIQUE NULL HYPOTHESIS 


For the past generation research workers engaged on agricultural trials, tests of the efficacy 
of therapeutic or prophylactic measures, sociological field work and bioassay have relied for 
validification of their results on decision tests devised by R. A. Fisher and his co-workers in 
conformity with a familiar pattern. Such tests entail: (a) the invocation of a unique so-called 
null hypothesis which prescribes the frequency with which a sample score will lie outside a 
prescribed limit (or limits); (b) the specification of a criterion of rejection, i.e. the convention 
to reject the hypothesis if the sample score does in fact lie outside the prescribed limit(s). 
Customarily (and oddly) the corresponding limiting frequency adopted is at the 95 per cent. 
(approximately 20: 1 odds) level; and the possibility of defining it in such terms resides in 
the fact that the unique hypothesis chosen for the purpose has an assignable distribution function. 
With one notable exception the latter is specifiable, though rarely recognised as such, within 
the framework of the Type System of Karl Pearson.* 

From one viewpoint, the prevalence of the fashion referred to is comprehensible. ‘The 
publication of Statistical Methods for Research Workers prepared the way for manuals by Snedecor, 
Tippett, Hagood, Quenouille and others, exhibiting schemata for computation in conformity 
with the Fisher test prescriptions. By recourse to a wealth of exemplary material the research 
worker willing to take the test prescription on trust can therefore readily, it may be all too readily, 
select a type specimen at least seemingly like his or her own problem. None the less, there 
must be among those who do so, not a few who have felt misgiving for any (or all) of several 
reasons, notably the following : 


(a) not infrequently the form of the null hypothesis is irrelevant to the main issue, e.g. 
as when the decision that two treatment procedures have different results is of trivial 
interest in comparison with the decision that treatment B is at least so much more effective 
than treatment Á ; 

(b) the type of decision which concerns the investigator determines the choice of a par- 
ticular null hypothesis far less than considerations of algebraic convenience wis-d-vis 
the specification of a sample distribution ; 

(c) the test prescriptions take no stock of any alternative hypothesis which may indeed 
be the main concern of the investigator. 


The first misgiving has special reference to the domain of estimation, and as such to the 
theory of confidence specially associated with the names of J. Neyman and E. S. Pearson. The 
second and third raise issues which a theory of test procedure also advanced by Neyman and 
Pearson has brought into focus; but their critique of the unique null hypothesis has to date 
exerted little influence on research workers outside America. This is less because their writings 
lack the polemic vitality of their predecessors than because the concepts invoked are logically 
subtle and on that account difficult to assimilate unless examined against a background of 
familiar material. The aim of what follows is to help the laboratory or the field research worker 
to recognise pitfalls in previously accepted test procedures and to materialise some of the essen- 
tially novel concepts of the Neyman-Pearson approach. 


* Thus Snedecor’s variance ratio F-test described as a score transformation of Fisher’s z-test is really a type VI, 
the Gosset t-test a type VII, the distribution of the sample variance a type III, including as a special case the 
current test for the 2 x 2 Table (1 d.f.) which is (as Fisher himself first pointed out) formally equivalent to the normal 
proportionate score difference test in the binomial domain, the significance test for the correlation ratio and for 
Spearman’s rank correlation coefficient is a type II, and the best fitting curve for a non-replacement sampling 
distribution in the 2-class universe is a type I. Fisher's distribution of the product-moment index for non-zero 
covariance when regression is linear is the notable exception to the foregoing remarks. 
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One way to do so is to formulate a biological problem which involves no predilection for 
a single hypothesis as the ull one in virtue of algebraic convenience as such. Accordingly, 
we shall again (p. 852) think of a culture of Drosophila containing normal females and females 
which carry a sex-linked lethal gene. With Bacon we shall concede that nature is more diverse in 
her operations than man in his conceptions, but our knowledge of laboratory conditions (pre- 
sumptively highly standardised) will justify the provisional assumption that any such female 
fruit-fly with an excessively large number of female offspring will, in fact, be either an entirely 
normal female or a lethal carrier. That is to say, we exclude such a contingency as the possibility 
that there is an endemic rare virus disease more fatal to male than to female larvae. We may then 
with some justifiable assurance postulate two hypotheses about any female in the culture : 


Hypothesis A: the female is normal, in which event the probability that any one of her 
offspring will be female is pa = 3. 


Hypothesis B: the female is a lethal carrier, whence the probability that any one of her 
offspring is female is p, = $. 


We shall now suppose that a particular female has 144 offspring, and examine the current 
theory of test procedure when the end in view is to decide whether we shall adopt one or other 
hypothesis. Our primary concern will thus be with what the test prescribes, and as such has 
no necessary connexion with whether it leads us to a correct decision. 

We first note that each hypothesis equally prescribes for 144-fold fraternities referable 
to a single fly mother the long-run frequency of such as respectively contain 0, 1, 2... 143, 
144 females. We may specify the relevant parameters thus 


Size of sample Probability that any Mean no. of females s.d. of score distribution 
(fraternity) single offspring is female in sample fraternity of the sample 
A 144 D=? M = 72 a =0 


From an algebraic viewpoint neither hypothesis specified above has anything to commend it 
as preferable to the alternative ; but we may lazily and arbitrarily agree to consider first of all 
the consequences of adopting A as the null hypothesis in the traditional sense, if only because 
laboratory and field workers would commonly do so in a comparable situation. Lazily and 
arbitrarily also, we shall first adopt a modular criterion of rejection for the same reason, 1.e. 
we shall reject the hypothesis chosen unless the number of females x is such that 


| (x — M.) | < Xa 


In conformity likewise with current convention, we shall choose the score X, so that the pro- 
bability («) that x will lie in the critical region, i.e. outside the range specified above is about 
0-05 if the null hypothesis correctly describes the situation. For samples of 144 and values of 
Pa (or p) anywhere (as in this example) within the range 0-1 to 0-9, the normal integral gives 
an adequate quadrature at the so-called 96 per cent. significance level, if we make the appropriate 
half interval correction. If we choose X, = + 12:5, so that (x — Ma) = + X, when (x — Ma) 
œ~ + 2-080 the table of the normal integral sets x ~ 0-038. In effect, we now have made the 
decision to regard the female with 144 offspring as normal if the number of her female offspring 
lies in the range 60 to 84 inclusive and to reject her claims as such, i.e. in this context to regard her 
as a lethal carrier, if her female offspring number more than 84 or less than 60. 

In the last sentence we assume that the end in view of the test is to arrive at a decision, as do 
at least ninety-nine per cent. of laboratory workers who invoke it in communicating the results 
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A. Modular Criterion 


Reject the Null Hypothesis if: 


= <3, Sog 
a 0-038, B~ 0-021 


84:5 96 
131A 


B. Vector Criterion 


Reject the Null Hypothesis if: 


x >85 
x~ 0:019, 8 ~œ 0-021 


N 
nN 


84-5 


‘O 
Go s 


84°5 
131B 


Fic. 131. Testing a null hypothesis (pa = 0-5) against the background of a single admissible alternative 
hypothesis B (p, = 0-6). 
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of their researches to the world at large. Fisher himself (vide Design of Experiments, fifth edition, 
1949, p. 16) says: ‘It should be noted that the null hypothesis is never proved or established 
but is possibly disproved in the course of experimentation.’ A doctrinaire exponent of the 
Yule-Fisher test procedure is therefore free to disclaim the intention stated in favour of an 
affirmative or positive answer in contradistinction to the alternative verdicts: (a) hypothesis 
false ; (b) hypothesis unproven. If so, only one of two sorts of error we shall now distinguish 
is relevant to the outcome of the test ; but the evasion of the other forces us to a somewhat damag- 
ing admission mentioned at the end of 20-04 and more fully in 20°08. 

In the Neyman-Pearson theory of test procedure we speak of « as the conditional pro- 
bability of making an error of the first kind. Now the cited value of « ( œ 0-038) correctly 
assigns the probability of rejecting the null hypothesis only on the assumption that the latter 
is true, i.e. that the mother fly is normal. This we do not know, the aim of the test being to 
throw light on the alternative possibility. If we carry out the rule of the test consistently, 
we shall sometimes make an error of the first kind, i.e. reject normal flies as such and by the 
same token wrongly identify as carriers flies which are indeed normal. Conversely, we shall 
sometimes apply the test to flies which are indeed carriers. If the number of females among 
their progeny lies within the range 60-84 inclusive we shall reject them as such. We shall 
then wrongly accept the null hypothesis. This is the error of the second kind, which we make 
in this context if the relevant parameters of the appropriate distribution are p, = 3, M, = 96 
and o, ~ 5-66. With due regard to the half interval correction, the region we then exclude 
is from 59-5 to 84-5 bounded by (x — M,) = — 36:5 and (x — M,) = — 1155, ie. (x — Mi) 
œ — 6-00, and (x — M,) ~ — 2:030,. Since the area of the normal integral of unit variance 
from — œ up to — 6-4 is utterly trivial, we make no sensible error if we say that the consistent 
application of the rule leads us now to reject carriers as such with a probability (6) assigned 
by the area of the normal curve of unit variance in the range from — oo to — 2:03. We speak 
of this loosely as the probability of making an error of the second kind; and the table of the 
normal integral in this case cites the value 8 œ~ 0-0212. More explicitly, £ is the conditional 
probability of accepting the null hypothesis, ¿f it is false ; but we have as yet said nothing about 
how often it will be false. 

In short, the only information we have at our disposal so far bears on the probability («) 
of rejecting the null hypothesis when it is true, and that (8) of rejecting the alternative when the 
latter is true, i.e. of accepting the null hypothesis when it is false. If we now suppose that we 
actually know the proportion of normal and carrier females in the culture, we can take our 
analysis a decisive step forward. We shall assume that the culture consists of 500 mothers of 
which 450 are normal and 50 are carriers. If we then choose at random * any single fly with 
144 offspring as a test subject we may say that 


(i) P, =90-9 is the probability that it will be normal, i.e. the probability that the null 
hypothesis is applicable to the test subject ; 
(ii) P, = 0-1 = (1 — P,) is the probability that it will be a carrier, i.e. that the alternative 
hypothesis is applicable to the test subject. 
We now have all the relevant data for a statistical specification of the long-run frequency 
of all 4 possible results of the outcome of the test : 
The fly is normal and we rightly accept it as such 
P.(1 — a) = (0-9) (0-962) = 0-866. 


* The effect of the lethal gene on the fertility of the fly introduces a bias for which we can allow, and one which 
we may therefore deliberately neglect for heuristic purposes. 
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The fly is normal and we wrongly reject it as such 
P, . a = (0-9) (0-038) = 0-034. 
The fly is a carrier and we rightly accept it as such 
(1 — P,) (1 — B) = (0-1) (0-979) = 0-098. 
The fly is a carrier and we wrongly reject it as such 
(L— Poe = (0-4 (002D = 0-002. 


To each assertion consistent application of the rule leads us to make we may thus assign 
a probability that it will be true or false. We may then set out a balance sheet as follows : 


Assertion true Assertion false 
Null hypothesis true P,(1 — œ) = 0-866 Po. a = 0-034 
Null hypothesis false (1 — P,)(1 — £) = 0-098 (1 — Pa) = 0:002 
Total 1 — B — (a — B)P, = 0:964 B + (a — B)P, = 0:036 


In conformity with the definition given above, we may speak with propriety of the pro- 
bability (P;) of making a correct decision and of the probability (P; ) of making a false one by 
consistent application of the rule, in which case our balance sheet yields 


P, = 1 — $ — (a — B)P, = 96:4 per cent. . i i oe 
P,=B+(«—B)P, =3:6 per cent. . ; AU 


For the reasons we shall come to later, the outcome of our choice of a rejection criterion 
is here vastly more encouraging than need be in most situations ; but we can do better. We 
have lazily adopted a modular criterion because laboratory and field workers commonly do so, 
regardless of the end in view, when the sample distribution prescribed by the null hypothesis 
is symmetrical. Now fraternities of 144 flies of which less than 60 are females will be vastly 
less common, if the mother is a carrier, than they would otherwise be. It would thus seem 
to be more reasonable to restrict our attention to families with an excessive number of females. 
We shall now therefore adopt a vector criterion, i.e. reject as abnormal only mothers with more 
than 84 female offspring, so that we exclude only one tail of the approximately normal dis- 
tribution and halve our error of the first kind, i.e. set x = 0-019. For reasons stated this does not 
materially affect the value of f since the chance that a carrier will have less than 60 females 
among 144 offspring is negligible. If we then say that we shall reject the null hypothesis at 
the vector level + 2-080 in contradistinction to the modular level + 2-080, we now put « = 0-019 
but 6 = 0-021 as before. Whence our balance sheet summarised by (i) and (ii) becomes 


P,; = 0-981 = 98:1 per cent.; P= 0-019 = 1-9 per cent. 


That the adoption of the vector criterion does in fact give a better prognosis of correct 
decision is not surprising, and Fig. 131 sufficiently exhibits why this is so in the situation under 
discussion. Indeed the use of a modular criterion, though sanctioned by custom, is meaning- 
less in such a situation. 

At this stage we may also note with profit an interesting consequence of (i). If «=f 
so that 1—f=1—a and (« — £) = 0, equation (i) reduces to P, = (1 — «). Within the 
framework of our assumptions that there is only one admissible alternative to the null hypothesis, 
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TESTING ALTERNATIVE HYPOTHESES 


Mean(M) a 
Hypothesis A 19-8 5 

Hypothesis B 36:2 5 i 
Probability 

CRITERION of ACCEPTANCE Error of First Kind ss. ing’ 
(rejecting Hypothesis A whenit is true) me A = 005 

for HYPOTHESIS A ee 

- Error of Second Kind. hd 6 = 005 

L28 (accepting Hypothesis A when it is false) B = 0 


<É<— + 
= 1640 


8 


M =19-8 


(\00-fold sample) 


WRENS TC a a 


X= 28 Mg362 


Fic. 132. Testing exclusive Alternative Hypotheses: (i) Rejection-Acceptance Criterion chosen to make 
error of first kind equal to error of second kind. 


so that P, = (1 — P,), we can assign a value to the long run frequency of correct decision based 
on consistent application of the rule without any prior knowledge (P, or P,) of the population 
at risk if we define our rejection criterion in such a way as to equalise the probabilities of errors of 
the two kinds. We can then predetermine that the value of « may be as small as we care to make 
it by prescribing a sample size sufficiently large. Needless to say, this presupposes the possibility 
of defining the distribution function of the single admissible alternative hypothesis. 

Within the framework of assumptions and in the same model set-up, let us now explore 
the effects of making our criterion for rejecting the null hypothesis more exacting in the sense 
that our error of the first kind is less. Thus we shall decide to accept a female fly with 144 
offspring as normal if (vector criterion) she has 88 or less female offspring and deem her (rightly 
or wrongly) to be a lethal carrier if she has 89 or more. We then set the decision limits on 
either side of x = 88-5, in which event (x — Ma) = + 2-750, and (x — M,) = — 1-330. 
Whence from the table of the normal integral we derive « = 0-003 and 8 = 0-092. If we 
paint these values in (i) we get 


1—B=0-908 and («x — B)P, = — 0-080, 

oo = 0-088 or 98S percent 
In this situation little advantage (98-8 as against 98-1 per cent.) accrues from making our criterion 
for rejection of the null hypothesis more exacting ; but we have chosen our null hypothesis 


as the hypothesis with greater prior probability since the culture contains a great excess of normal 
flies. Let us then reverse the situation by postulating that the culture contains 450 lethal 
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TESTING ALTERNATIVE HYPOTHESES 
Mean (M) 


Hypothesis A 19-8 : ee 
Hypothesis B 36-2 5 e 
de Probability 
ITERION of ACCEPTANCE Error of First Kind = 0: 
el Sa (rejecting Hypothesis A when it is true) E K = 0-007 
sn iia uae Error of Second Kind EA 
x< 321 (accepting Hypothesis A when it is false) RB = 0-206 


(100 -fold sample) 


See eee mae rra 


o =32:1 


M|=36:2 


Fic. 133, Testing the same exclusive alternative hypotheses as in Fig. 132 and same sample size, choice of rejection- 
acceptance criterion which makes the error of the first kind smaller, makes the error of the second larger. 


carriers and 50 normal among 500 female flies in all, i.e. P, = 0-1 and P, = 0-9. In this case 


(a — B)P, = — 0-009, so that 
P, = 0:908 + 0-009 = 0-917 or 91:7 per cent. 


If the null hypothesis is referable to the smaller population at risk (i.e. if it has lower prior 
probability than the alternative) the effect of making the rejection criterion more exacting is to 
lower the probability of arriving at a correct decision. 

Before discussing how far this rule is of general application within the framework of our 
model assumptions, let us take stock of another highly relevant variable, viz. sample size. For 
a fixed size of sample the foregoing results have sufficiently emphasised what a visual diagram 
suffices to demonstrate (Figs. 132-134), i.e. we cannot decrease the conditional probability («) 
of an error of the first kind without increasing the conditional probability (8) of an error of the 
second kind and vice versa. It is also of importance to appreciate that we can make £ for a 
preassigned value of « as small as we wish to make it, only if we make the size of the sample 
large enough. Conversely, we can keep « at a preassigned level for a smaller test sample 
only by making £ larger. 

Consider for example the consequence of applying the foregoing test to fraternities of 100, 
so that M, = 50; o, =5; M,= 666 and o, = 4:71. If we make the vector criterion of 
acceptance or rejection conveniently near the 2o level, we shall set it on either side of a score 
level 60-5 in which event (x — M,) is approximately 2-10, and « ~ 0-036. IE so, (x — M,) 
is approximately — 1-320, and B ~0-0934. The value of « here agrees as closely as need 
be for exemplary purposes with the value chosen (« = 0-038) for the 144-fold sample when 
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TESTING ALTERNATIVE HYPOTHESES 
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Fic. 134. Testing the same exclusive alternative hypotheses as in Fig. 132 and same sample size, choice of rejection- 
acceptance criterion which makes the error of the second kind smaller, makes the error of the first larger. 


B=0-021. Thus the effect of reducing the sample size is to increase more than 4-fold the 
probability of an error of the second kind for a corresponding probability of error of the first 
kind. 

In this case we can make our two error risks nearly equal by setting our limits of rejection 
and acceptance for the null hypothesis on either side of the score x = 58:5, in which event 
the null hypothesis sets the upper limit of acceptance at + 1:70, and the alternative sets the 
lower limit for rejection at — 1-74c,. Thus « œ 0-045 and 8 ~0-041. If P, =0:9 as in 
our first example P, = 95-9 per cent. For the 144-fold sample we obtained P,=98:1 per 
cent. when the two conditional risks were nearly equal. 

Before we go further, we may well retrace our steps. We made the arbitrary decision to 
designate as our null hypothesis the assertion that the mother fly is normal. Actually, we 
have given no reason for doing so ; and we may pause at this stage to dispose of a misconception 
widespread among those who carry out routine tests within the framework of the unique null 
hypothesis. There is prevalent a somewhat naive view that we choose our null hypothesis as 
a safeguard against wishful thinking, and that we make accordingly our criterion of rejection 
as exacting as need be. On such a view our criterion of rejection is at best a disciplinary con- 
vention ; and as such has nothing to do with unconditional statistical inference. Also one can 
justify it as such only if one chooses the null hypothesis on the understanding that one wishes 
to fall backwards in preserving one’s rectitude, i.e. if the null hypothesis is actually the one 
the investigator has reasons for believing to be false. Evidently no recipe that the best Mrs. 
Beeton can prescribe will indeed meet one’s requirements in all situations. If experiments 
on laboratory stocks have convinced the investigator that a new therapy is preferable to a current 
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procedure, the enthusiastic research worker will not reasonably impose on the null hypothesis 
a criterion of rejection as exacting as that of the sceptical investigator undertaking experiments 
to test the credentials of extrasensory perception. In conformity with current procedure, he 
or she will nevertheless invoke a null hypothesis of the same type in either situation, and with 
the same convention (e.g. 0-05 significance level) of rejection, if accustomed to rely on current 
cookery book recipes. ‘The reason is that the cookery book recipe will commonly prescribe 
as the appropriate null hypothesis the one which commends itself to the mathematician because 
he can manipulate it algebraically, i.e. for reasons which have nothing to do with the operational 
intention of the scientific worker. 

In the model situation discussed hitherto, we have, in fact, sidestepped the temptation 
to choose our null hypothesis for this reason, since it would be equally easy to adopt as such 
the postulate that the fly mother is a lethal carrier. A rejection criterion identical in terms 
of the conditional risk of error of the first kind, as is indeed the most we can specify within the 
framework of a unique null hypothesis, would then lead us to results numerically inconsistent 
with those we have so far explored. The reader may check this assertion by reversing the role 
of the two hypotheses in the foregoing examples. 

Partly because of the size of the samples chosen, previous tests in our model situation have 
led to a high probability of correct decision arrived at in conformity with traditional procedure, 
i.e. within the framework of the unique null hypothesis. This may lead us to a totally wrong 
view of what we can rely on it to accomplish, if we fail to take stock of two background con- 
ditions plausibly invoked in the prescribed set-up, but rarely admissible in other situations, 


(a) we concede only one admissible alternative to the null hypothesis ; 


(6) we have postulated a complete specification of the sampling distribution in terms of 
the alternative thereto. 


It will be simpler, if we first examine the implications of (b). In all examples hitherto 
cited we have found that P, > 0-5, i.e. that more than 50 per cent. of our decisions will be right 
if we consistently follow the last prescription, in which event we shall be more often right than 
wrong. Now there is no reason why this should be so, other than the fact that we can here fix 
in advance the size of the sample and the criterion of rejection or acceptance for the null hypothesis 
with due regard to the value of the relevant parameters of both hypotheses. To clarify the relevance 
of the consideration last stated, let us now replace the female carriers of a sex-linked lethal gene 
by females with a virus infection to which their male offspring succumb somewhat more readily 
than their sister flies. We shall postulate a sex ratio of 11:9 in favour of females among the 
progeny of infected mothers. Our alternative hypothesis is now that p, = 0:55. 

On the new hypothesis which we again provisionally assume to be the only admissible 
alternative to the null one (p, = 4), we have M, = 55 and o, œ 4-98 for fraternities of 100. 
If we set our rejection criterion on either side of x = 60-5 we have as before (x — M,) = + 2:10 
and (x — M) = + llc, whence «=0-018 and 8 =0-864. Thus (1 — 8) =0-136 and 
(x — £) = — 0-846, whence from (i) 


Pic 99:7 per tem, when P, = 00’ 
Po 224) per com. whea -P..=.0-L. 


This example illustrates the important role of the prior probabilities P, and P, = (1 — P,) 
which define the populations at risk under each hypothesis. If P, = 0-1 we have P, = 22 per 
cent. and P, = 78 per cent., i.e. consistent application of the rule will lead us to be wrong more 
often than right when the sample is as small as 100, but we can fix the size of the sample to 
ensure that P; > 3 only if we can assign at least a lower limit to P,, and then only if we can 


ES 


T5 
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assign a value to 8. Now we cannot assign a value to 8 unless we can specify the appropriate 
parameter (in this case p,) or parameters of the sample distribution of the admissible alternative 
hypothesis. In any case, it will rarely happen that we as easily conceptualise the meaning we 
can here confer on P,, and still more rarely that we can equip it with a numerical value. 

The considerations last stated do not exhaust limitations of current test procedure within 
the framework of the unique null hypothesis. We have so far assumed a single admissible 
alternative and we can very rarely, if ever, make such an assumption with propriety. Indeed, 
much statistical enquiry goes on against a background of an infinitude of admissible alternatives 
to any null hypothesis we may advance. In our model set-up we have already adumbrated 
this complication ; and we may now with profit postulate 3 admissible hypotheses. We shall 
then get into focus a general conclusion concerning the utmost we can legitimately infer in 
the domain of unconditional assertion from the outcome of a decision test referable to a unique 
null hypothesis in the absence of background information concerning admissible alternatives 
thereto. We may anticipate it by reinterpreting (ii) above. When $ >« in (ii), (x — £) is 
negative and P, < f, i.e. the uncertainty safeguard is less than the greater conditional risk. When 
B <a, (x — f) is positive and P; > £$, i.e. the uncertainty safeguard is greater than the smaller 
conditional risk. 

The conclusions last stated refer to a situation in which only two hypotheses are admissible. 
An examination of a model situation in which three hypotheses are admissible will exhibit it 
as a particular case of a general proposition expressible as follows on the assumption that our 
concern is with the long-run proportion of true assertions we make on the basis of a test pro- 
cedure of the type under discussion : 


(1) the worst that can happen is that we shall exclusively encounter situations prescribed 
by the hypothesis associated with the greatest conditional risk (p,) of erroneous rejection ; 
(ii) the best that can happen is that we shall exclusively encounter situations prescribed by 
the hypothesis associated with the smallest conditional risk (p,) of erroneous rejection. 


Our new model will be that the following three hypotheses are admissible w.r.t. our fruitfly 
culture ; and we therefore start by assuming that we can define the relevant parameters and 
sample distributions referable to each of them: 


Hypothesis A : 'The female is normal, | Pa = 
Hypothesis B : ‘The female transmits a virus infection, p= 
= Hypothesis C : "The female carries a sex-linked lethal, p, = 


| 
Dip aw aow 


bol 
lo Ola pol 
olo ou ojo 


We shall designate the proportions of the 3 types of female flies as respectively P,, P,, P, 
so that (P, + P, + P.)= 1. In real life there is no reason to exclude the possibility that a 
lethal carrier could transmit the virus ; but we assume for argument that such flies are 100 per 
cent. resistant to it. We are free to select any one of the above as our null hypothesis ; and 
shall first assume that our null hypothesis is p = p,. In that event we may choose a single 
rejection criterion x > x,,, thereby fixing the conditional risk of rejecting the null hypothesis 
when true as a, that of rejecting hypothesis B when true as £ and that of rejecting hypothesis 
C when true as y. The conditional probability of an error of the first kind is «P,. We 
accept the null hypothesis erroneously, if we either reject B when true or reject C when true. 
Whence the probability of wrong acceptance is BP, + yP,, and the probability of making a false 
decision of one sort or the other is | 


P, =P, +8P8P++yP. . i ; ; ; Z 
This is equivalent to 
P,=(1 — P) = 1 — «P, — BP, — yP.. 
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If our null hypothesis is p = p,, our rejection criterion must make y < $ since p, lies nearer 
to p, than does p,. If we define it so that « = B whence a > y, we may write (iii) in the form 


Ppa THE, EP) yP, 
= ] — a(1 — P.) — yP., 
“. Pp=1l—a+(a— y)Po. 
Since « > y on the assumption that « = £, (a — y) is positive and 
RDA ado Pp <a ee 


We have here arbitrarily chosen as our null hypothesis p = p,. In principle, the procedure 
would be alike if we chose as our null hypothesis p = p,. If we choose the second hypothesis, 
it will be different, because we must now adopt a modular rejection criterion, i.e. we reject 
hypothesis B in favour of A if x < x. and reject hypothesis B in favour of C if x > xp. We 
shall then denote the error of rejecting the null hypothesis when A is true by f, and that of 
rejecting the null hypothesis when C is true by £.. The probability of erroneous rejection 
is then P,(6, +f.). The probability of erroneous acceptance is («P, + yP,), and 


P; = P,(Ba + Bo) + Pa + yPo. 
If we choose our two rejection criteria so that « = 8, and y = fe, we then derive 
P, = (P, + PoBa + (Ps + PB 
AP 
. Py < (Bu + Bo), 
. Pye LIB. 8) . ; : ‘ ; ; =) 


Since the conditional risk of an error of the first kind is (Ba + Be) = B, the choice of B 
as our null hypothesis leads us to the same result as (iii), i.e. 


e 8. 


Thus we can always choose our rejection criterion or criteria to make the uncertainty safeguard 
less than the conditional risk of error of the first kind when either of two alternatives to the 
null hypothesis is admissible each being specifiable. With the same reservation we can make 
the probability of correct decision as near to unity as we care by making the size of the sample 
appropriately large. 

So far we have assumed a backstage view of the situation, i.e. that we can in fact specify 
precisely each admissible alternative to the null hypothesis; and it will rarely happen that 
we can do so. If we cannot specify either admissible alternative, we may restate the foregoing 
argument in more general terms recalling that the form of the uncertainty safeguard does not 
depend on the choice of a modular or vector criterion of rejection. If p, is the conditional risk 
of rejecting hypothesis H when true, we may write our uncertainty safeguard as follows for a 
situation involving only three hypotheses with definitive parameters pa, Pe P. and prior prob- 
abilities P,, P,, P, : 


Rn O O 


In this expression 


oe ee 
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The probability of wrong rejection is (Pax, + Pa.) = «Pa. That of wrong acceptance is 
P,B + P.y, and 
P; = «aP, + Pob + Poy . . > = . (vii) 
In this case, we may note that the choice of the criterion x, so that 6 = a, and of x, so that 
y = a, implies 
Py =(P, + Pio + PP, + Pee = 0 — Poe a 
one P; = a + a, — (Pa, + Pote). 


Whence P, < « and in agreement with (iv) above P; > 1 — ua. However, we are here exploring 
a situation in which we cannot fix x, or x, to fulfil the condition stated, since we cannot assign 
numerical values to either p, or pe If so, we can at least say that 


P,<0a+fBw+ y. 


We can, however, go one step further. We shall now generalise the utmost we can legitimately 
infer from the performance of a decision test within the framework of a unique null hypothesis, as 
we have been accustomed to perform it, i.e. against the background of an unknown number (m) 
of admissible alternatives to it, and with no precise specification of any one of them. As before, 
we shall denote by p, the probability of rejecting the hth member of the m-fold set when it is right 
and the corresponding risk of rejecting the null hypothesis (p = pp) when right by po, so that 


h=m 
r= > Pr Pi 
h=0 


Consider now the hypothesis for which the risk of rejection (p) when true is greatest, so that 
€n. y is positive if p, = (pg — er. y) and e,., =0, 


h=m h =m 
P; = py 2 Py > Ph «Ch. g 
h=0 h=0 


h=m 
E x aS > Tice 
h=0 


= PS and P,>1—p,. . . . (1x) 


If p, is the smallest value of h so defined we may write pa = (ps + en.s) in which e,.s is again 
positive and e.s = 0, so that 


h=m h 
P,=p > Pa + 


h=0 h= | 
<P, > p and Py Ls ; : : i (x) 


The situation last discussed suffices to get into focus the most we can say about the outcome 
of a single hypothesis decision test within the domain of unconditional inference when, as com- 
monly, we can neither state how many alternatives to it are each admissible nor specify each 
in numerical terms. If we denote by ps and p, respectively the least value and the greatest 
value of the probability of denying any one of the set of hypotheses when true in conformity 
with the rejection criterion chosen, the probability of a false verdict lies between limits definable 
as follows : 


m 
Pr bs 
0 


p, = Fy = fee : ~ (i) 


Unless we can specify the parameters of each admissible alternative to the hypothesis chosen 
as the null one, we can say no more about p, in (xi) than that it is less than unity ; but if we 
can specify each admissible alternative hypothesis precisely, we can choose our rejection criterion 
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to make the probability (x) of rejecting the null hypothesis when true, so that a = p,, whence 
P,<«. By appropriate choice of sample size we can then make the probability of a correct 
verdict on the null hypothesis as near to unity as we like without invoking any information w.r.t. 
its prior probability.* We thus arrive at the following conclusion: a test procedure may 
be informative in the domain of unconditional inference if, and only if, we can precisely specify 
each of an exhaustive and exclusive set of hypotheses. | 

The last statement calls for qualification on two counts. A parameter p, definitive of an 
admissible alternative to the null hypothesis (p = Pp) may be indefinitely close to pọ itself. If 
so (Fig. 135), p, ~ 1 — py for samples of finite size and we can make both p, and po indefinitely 


ALTERNATIVE HYPOTHESES INVOLVING NEARLY EQUAL PARAMETERS 


Mean(M) E 
Hypothesis A 19-8 5 
Hypothesis B 20-8 $ ee” 
robability 
CRITERION of ACCEPTANCE Error of First Kind - 0- 
(rejecting Hypothesis A when it is true) a K = 0-05 
for HYPOTHESIS A 
Error of Second Kind —_. ge B = 0:93 
X< 28 accepting Hypothesis A when it is false) 


28 


(100- fold sample) 


M,=20°8 anes 


Fic. 135. Testing Exclusive Alternative Hypotheses. The sample size and the variance of the u.s.d. for each 
hypothesis as in Fig. 132 and the null hypothesis (M, = 19°8) unchanged. By making our alternative hypothesis 
that M, lies very near M, without changing the size of the sample, we can make $ ~ (1 — a). 


small only by making our sample indefinitely large. This consideration has an important 
bearing on the concept of test power touched on below. Here it is relevant because we can 
rarely be certain that no such hypothesis alternative to the cne chosen as null is indeed admissible. 
This raises a question of pivotal importance in connexion with the foregoing exposition: in 
what situations can one postulate an exhaustive and exclusive set of admissible hypotheses 
which fulfil all the relevant conditions now stated ? 


* Neyman and Pearson (1933), “ Testing Statistical Hypotheses in Relation to Probabilities a priori,” Proc. Camb. 
Phil. Soc., Vol. 29. 
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An important class in which the postulate of an exclusive and exhaustive set of admissible 
hypotheses is legitimate arises in pathology when we can 


(i) classify test subjects as healthy or sufferers from a particular disease ; 


(ii) assign a probability on the basis of laboratory experience to the assertion that a single 
test will fail to identify them correctly. 


For heuristic purposes a criterion for screening tuberculous patients cited by Neyman 
will serve as a type specimen.* On the basis of laboratory experience, we assume that a single 
-ray film will: (a) fail to detect the disease in 40 per cent. of sufferers; (b) give a 
positive result for 1 per cent. of healthy test subjects. Clearly we need to make more than one 
film, if we aim at a high level of satisfactory diagnosis. We shall assume that we take 5 and 
adopt as our test criterion the rule that we deem the disease to be present if at least one film 
is classifiable as positive. Our first which we may call the null hypothesis (hypothesis A) will 
be that the disease is present. Our test criterion leads us to reject the hypothesis if all 5 films 
are negative. ‘The relevant parameter (p,) is the probability of failure, in this case 0-4. Hence 
the probability of rejecting the null hypothesis (i.e. wrongly classifying the test subject as 
healthy) is 
a = (0-4)5 = 0-01024. 


The alternative hypothesis (hypothesis B) is that the test subject is healthy. If so, the pro- 
bability (p,) of getting a negative result from one film is 0:99 and that of getting 5 negative results 
is (0-99)°. We shall reject the alternative hypothesis if at least one film is positive, i.e. 


B = 1 — (0-99)5 = 0-04901. 


The reader will find it instructive to explore the outcome of different test criteria based 
on different sizes of sample (i.e. numbers of films per test subject) within the framework of - 
the foregoing assumptions. In this context P, in (i) above is the incidence of tuberculosis 
in the population. Our truth equation is 


P, = 1 — B — (a — B)P, 
= 1 — 0-04901 — (0:-01024 — 0-04901)P, 
= ()-95099 + (0-03877)P,. 


Thus the test criterion ensures a probability of overall correct decision a little more than 95:5 
per cent. if the incidence of the disease is 1 per cent., and must inevitably ensure a figure more 
than 95 per cent. for an overall correct verdict in accordance with (xi) of 20.03. However, we 
shall now see that unconditional assertions of this sort are of trivial interest in connection with 
decisions of this kind, though the procedure illustrated may well have applications in the 
domain of differential diagnosis. 

Throughout our treatment of the model situations explored in this section we have assumed 
that the fly cultures are composite ; and that P,, P,, etc., specify existent subpopulations at risk. 
We have then the model situation to which Bayes’ theorem refers. In an actual situation we 


* The medical specialist will recognise some arbitrary assumptions in the argument here advanced for illustrative 
purposes alone. 

t The writer is greatly indebted to Mr. R. Wrighton for suggestions and criticisms incorporated in this and the 
ensuing section. Since it went to press he has called my attention to two important contributions which lay bare 
the limitations of the test procedures commonly employed in biological and sociological work. Jackson (Stat Res. 
Mem. I, 1936) introduces the concept of stringency, a test being most stringent if it assigns a minimal unconditional 
uncertainty safeguard (Wrighton) as here defined. V. Mises (Ann. Math. Stat. 14, 238) uses the term error chance 
in the same sense and speaks of the success rate of a test as equivalent to P, in the foregoing discussion. The term 
stochastic credibility suggests itself as of wider applicability in the common domain of test procedure and estimation. 
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might not know whether the culture is homogeneous or composite. Alternatively, we might 
know that the culture contains flies of only one sort without knowing which. From the viewpoint 
of Bayes’ theorem, we might then state that one value of P, is unity and every other value of P, 
is zero ; but we should know the answer to our problem only if we could identify the hypothesis 
H which assigns P, = 1. The test procedure for two exclusive alternative hypotheses sidesteps 
both horns of the Bayes’ dilemma. If we know the culture is composite, we need not know the 
contribution of each population at risk. If we know that it is homogeneous or if we merely 
know that it may be, the same recipe holds good, since two relations hold good for all values of 
P, and P, including as a limiting case P, = 0, P, = 1 or vice versa. In situations to which 
the Bayes’ balance sheet is factually irrelevant, each hypothesis being referable to an existent 
population at risk, and in situations to which it is factually irrelevant in the sense that we cannot 
realistically conceptualise the sampling process in two stages, it is equally true that our uncertainty 
safeguard (P,) lies between « and f, being equal to a if we design a trial to make « = £. 


20.04 THE CONCEPT OF TEST POWER 


In the foregoing section we have examined a model situation, vzz. a fruitfly culture, to throw 
light on the relevance of test procedure to unconditional inference, i.e. our concern has been 
to assign a probability to a correct decision for or against a hypothesis. On the assumption 
that the female deemed to be normal in this context is a new and valuable mutant, we might 
also formulate our problem in terms of conditional inference. Thus we may wish to curtail 
both the risk of letting lethal genes accumulate in our stock and the risk of destroying normal 
stock otherwise available for perpetuating it. Accordingly, we decide to screen our females 
by setting up a rejection criterion which will set an acceptable limit to the risk incurred in 
retaining a lethal carrier and an acceptable limit to the risk of losing an otherwise normal female 
which carries the mutant gene we seek to perpetuate. 

We can likewise, and usefully, regard the issue at stake in a diagnostic test such as the one 
Neyman cites as on all fours with decisions which arise in quality control, when the end in view 
is to ensure against incurring hazards respectively (vide 20.02) designated as producer's 
risk and consumer’s risk. The main preoccupation of the administration in the situation 
last discussed will in fact have less to do with an overall assessment of correct judgment 
than with the penalties of making mistakes of two sorts. To classify wrongly a tuberculous 
person is to deprive him or her of proper treatment. To classify wrongly a healthy person 
is likely to cause unjustifiable alarm and despondency. A test procedure which prescribes 
that neither risk exceeds what the authorities regard as admissible therefore satisfies the 
practical demands of the situation from their viewpoint. We may state these demands 
explicitly in the form : 

(i) 2f the test subject is tuberculous, the risk of erroneous diagnoses must not exceed q; ; 
(ii) ¿f the test subject is healthy, the risk of erroneous judgment must not exceed a. 


Any unconditional statement we can legitimately make in this context presupposes the 
possibility of classifying the test subjects exclusively as of one or other type ; but the adminis- 
trative intention does not change, if we postulate that an appreciable number of test subjects 
are unclassifiable by recourse to any available independent diagnostic criterion. The test 
need not then lead to consequences embarrassing to authorities content to disclaim responsibility 
for individuals unless definitely deemed to be healthy or tuberculous. Undoubtedly, there 
will arise in administration many comparable costing situations in which a conscientious claim 
for limiting the requirements of a test to such conditional assertions is admissible ; but the 
propriety of such a procedure in the domain of scientific research is at least open to debate. 
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Recent literature on quality control techniques justifies the suspicion that some writers 
would advance the claim that conditional decision tests of this type are appropriate in the domain 
of the prophylactic or therapeutic trial. It is therefore pertinent to examine the relevance of 
the analogy between the end in view of the salesman and that of the research worker approaching 
a clinical trial against the background of laboratory experiments im vitro or on animals. In 
this situation the investigator will not lightly incur the risk of losing credit for a major discovery 
nor cheerfully shoulder the risk that subsequent enquiry will discredit his conclusions. If 
content to follow the practice of the large-scale commercial corporation, he will therefore invoke 
a test procedure which will set appropriate limits to the risk of wrongly rejecting the alternatives : 
(i) his own assertion that treatment B guarantees b per cent. more cures than treatment A; 
(ii) the assertion of an imaginary critic that treatment B guarantees only a per cent. more cures 
than treatment A. By all too easy stages, statistical inspection then becomes a recipe for statistical 
careerism. ‘The investigator and his putative opponent relinquish their proper relation as 
colleagues in the impersonal pursuit of truth to embrace a convention which safeguards the 
amour propre of each. The decision to make the best of a bad job in this sense involves an 
ethical issue which is not amenable to arguments likely to win universal assent ; but it carries 
with it an implication which may well damp the enthusiasm of the convert. This will come 
into focus, if we here digress to clarify the Neyman-Pearson concept of test power. 

In the taxonomic domain of the 2-class universe we specify « and PB in the following way 
for the r-fold sample when the criterion for rejecting hypothesis A (p = p,) and hence for 
accepting hypothesis B (p = p,) is x >t: 


e=r a=(t-1) 
TE 2. Pab — pa)"; L—-a= > Tapal l — Pay * . - (i) 
a=t z=0 
x=(t-1) = 


Ba Z Tapi — py 1—B= Y TA coe 
piges a=t 


What Neyman calls the power function F(p) of the test for the same size (r) of sample 
and the same criterion score (£) is picturable as the graph of the following function over the 
range p=U top ==]: 


F(p)= 2 nap (1 py" ee eG) 
It follows that om 
F(p.) =4 and F(p)=1-—B 3 ; E 1 
For a given value of r and of t, the condition that « = f is, of course, 
(pa) 1 — PA A) 


Having fixed any criterion for rejection of the null hypothesis (4), and having chosen 
the alternative hypothesis (p = p»), we speak of F(p,) as the power of the test. This being 
(1 — £) is the probability of rejecting the null hypothesis when it is false, on the assumption 
that the alternative is the only admissible one. One test prescription is more powerful than 
another if it has a higher power in this sense for a fixed value of a, i.e. if it assigns a lower pro- 
bability to error of the second kind for the same probability of error of the first. If the two 
test prescriptions both invoke the same distributions, the test which employs a larger sample 
must be the more powerful one. | 

The reader will find it instructive to plot F( p) against p for the following example of a test 
procedure. The null hypothesis is that p= 4 when r = 144. The rejection criterion is x>82:5. 
For the distribution prescribed by the null hypothesis the mean is 72 with o = 6. Whence 
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the criterion score in standard form is (82:5 — 72) + 6 = 1:75. This excludes 4 per cent. 
of the area of the normal fitting curve, i.e. « = 0-04. For this set-up we may tabulate as below : 


TABLE 1 
p M X= (825 — M) o c=X-=o0 B F(p) = (1 — B) 
29 60 22-5 5-9161 3-8032 > 0-999 <0-001 | 
a 66 16-5 5:9791 2-7596 0-996 0-004 | 
| = 78 4-5 5-9791 0-7526 0-773 0-227 
| a 84 i $5 5-9161 —0-2535 0-401 0-599 | 
= 90 —7:5 5-8095 —1-2910 0-099 0-901 
23 96 —13:5 5-6568 —2:3865 | 0-013 0-987 | 
H 102 —19:5 5-4544 —3-5750 <0-001 >0-999 | 


The concept of test power is easily interpretable in the alternative domain of representative 
scoring. The simplest type of test is then one which invokes alternative hypotheses specifying 
the mean score of the u.s.d. for each of two normal universes. Consider now the following 
model. We do not know the mean value (M) of the normal variate ; but we do know that the 
variance of the u.s.d. is 2500. Whence that (c2) of the mean of the r-fold sample is 2500 — r. 
For the 100-fold sample om = 5. Our null hypothesis is that M = 18-2. The standard score 
corresponding to a sample value (M,) of the mean is therefore (M, — 18-2) — 5. To make 
x = 0-05 we must make the deviation equal to 1-640,,, i.e. (M,, — 18:2) = 8-2, whence M, = 26:4. 

The alternative hypothesis which makes 8 = 0-05 is that (M, — M) + om = — 1:64, 
so that (26-4 — M) = — 5(1:64). Whence the hypothesis is that M = 26-4 + 8:2 = 34-6. 
If our alternative hypothesis were that M = 28-2, the score deviation would be (26-4 — 28:2) = 
— 1-8 or — 0-36c,,. At this level 8 = 0-359. To make the two risks equal when the sample 
size is 100 and the alternative hypothesis is M = 28-2 we must choose the sample value (M,) 
definitive of our rejection criterion, so that 

M: — 182... (M, — 282) 
5 El 5 i 
In this case M, = 23-2, i.e. (M, — 18:2) = 5 = om, so that « = 0:159 = B. To equalise 
both risks as nearly as possible at the level « = 0-05 = £, when the alternative hypothesis is 
that M = 28-2, we must enlarge our sample size (r) so that om = 50 + yr in the identity 
23:2 — 18:2 _ T oe — (23:2 — 28:2) 
Om Om 
Whence we get 
a/v = 16-4. 
Whence r = 269 to the nearest integer. 
We may generalise the rules of test prescription thus : 


(i) to fix a at hyom level we make our test criterion 


(t — Ma) = f, so that t= M, — EA ; : (vi) 


Om 
te” 
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(11) to determine f in terms of h,o,, we then have 


=> My 


Om 


h, O ca 


(iii) to equalise the two risks without changing the size of the sample we make 


to — Ma _ — (to — Mr) 


Cm Om 


so that =HKM, tM ll) 


(iv) for equal risks we specify « in terms of h,o,, by the relation 


ME — M 
dde a ao h, so that A, = ied a (ix) 
Te, E 
(v) to equalise risks at the level « =0:05 = g (so that h, = 1:64 = — h,), we must 
change the size of sample from r, to 7, so that the new value of om is 7? . 7320m and 
E y 3.98) 27.62 
1-64 = (4, — Ma)vts eo that => OLA 


- A AZ ; i $ 
20mV11 (M, — Ma)? (x) 

The reader will easily adapt the foregoing to the situation in which we require two para- 
meters to specify the normal universe of a hypothesis, viz. M,, and o,. For the simpler case 
under consideration, the power of the test being (1 — £) for any alternative to the null hypothesis 
is expressible in the form 


ds ¿Ms E : i : cta) 
Om 
We may then write 
1 00 
PO) =| edo ql 
en) V 2a dn ea) 


We may now make an exploratory table for test design as follows on the assumption that r = 100, 
Om = 9 and t = 26-4, so that a = 0-05 when M, = 18:2: 


TABLE 2 
(i) (ii) Gii) Gv) (v) (vi) 
Value of M Level of Power of Val f Val f 
definitive of rejection (h) Correspondin test eee ganis 
J p g 
: ene f when r when 
alternative expressed as value of £ criterion <p = 0-05 =P 
hypothesis hom (1 — 8) er. ER a 
18-4 —1-6 0-945 0-055 0-492 672,400 
19-4 —1-4 0-919 0-081 0:452 | 18,678 
22-4 —0-8 0-788 0-212 0-337 | 1,525 
24-4 —0:4 0-655 0:345 0:268 700 
26:4 0:0 0-500 0-500 0:206 400 
28-4 0-4 0-345 0-655 0-154 259 
30:4 0-8 Q-212 0-788 0-111 181 
32-4 1 0-115 0-855 0-078 133 
34-4 1-6 | 0-055 0-945 0-053 102 
36-4 2-0 0-023 0-977 0-034 81 
38-4 2:4 0-008 0-992 0-022 66 
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Since the size of sample fixes the power of the test for a fixed value of a, we can set f at 
a predestined level appropriate to any single chosen alternative of the null hypothesis only if 
we plot P, for different values of 7. Table 3 sufficiently illustrates the procedure ; and the 
reader may find it instructive to check the arithmetic by recourse to the foregoing equations. 


TABLE 3 


Power function (P;= 1 — B) for the same model as in Table 1, tabulated separately for different values of sample size 
r, with the same rejection criterion (a = 0:05) for the null hypothesis (M = 18-2), when om = 5 for the 100-fold 
sample. As the head of the columns are score values (M,) corresponding to the condition « = 0°05, and values of 
Gm for the appropriate value of r. 


| Size of Sample 


Hypothesis 81 144 256 324 


M, = 27-3 M, = 25-03 M, = 23:33 M, = 22:52 


! o Oe DS UN E AS ee ae eee 
En ae ar ET: 6, == 8435 0: 2-8316 
| 


We are now in a position to see more clearly the implications of approaching the inter- 
pretation of the outcome of a prophylactic or therapeutic trial as one of accommodating the 
producer’s risk and the consumer's risk in the theory of quality control. If we do so we conceive 
the test procedure as a game of chance in which the investigator arranges the stakes to accom- 
modate the inclinations of a wholly imaginary contestant. His assertion is that treatment B 
guarantees b per cent. more cures than treatment 4, and this fixes the value of b. His fictitious 
opponent asserts that treatment B guarantees only a per cent. more cures; but because his 
opponent is merely a figment of his own fears, all that he can say about a is that: a < b. If he 
conceives that his opponent is ready to deny any operational advantage (a = 0) for treatment B, 
he may set his own risk as equal to that of his opponent at a much lower level for a fixed size of 
sample than will be possible if his opponent makes a far more moderate claim (e.g. a = 4b). 
Alternatively, the design of a test to equalise risks at one and the same level will prescribe the 
availability of larger samples, if he conceives that his opponent, having first denied any advantage, 
will subsequently concede that there is some. 

Having chosen the form of his own assertion (here the numerical value of 6), the only 
exclusive admissible alternative with which he can equip the imaginary disputant of his claim 
is a <b; but the alternative test procedure then prescribes recourse to an infinite sample as 
a prerequisite to a firm decision. Within the restricted framework of conditional inference, 
the alternative test procedure can thus offer no simple nor unique recipe for validifying the 
operational advantage claimed for a new procedure. Should the reader find the foregoing 
argument obscure, it may help to clarify the issue, if we go back to the data of Tables 2 and 3 
(in which om = 5 when r = 100). We shall suppose that: (i) disputant A initially asserts 
that M — 18-2 and disputant B initially asserts that M = 26-4; (ii) both disputants initially 
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agree to accept the outcome of a test which vindicates the claim of B if the 400-fold sample 
value of M, exceeds 22-3. In that event each takes a 5 per cent. risk of being discredited. We 
now suppose that an arbitrator persuades disputant A to concede that M = 19 and disputant 
B to concede that M = 25:6 still taking equal risk on the outcome of a 400-fold trial. Disputant 
A will still lose his case if M, > 22-3, but each disputant now incurs a 26-8 per cent. risk 
(x = 0:2676 = £) of getting an adverse verdict. 

From the foregoing heuristic discussion the reader must not infer that the quality control 
procedures permit us to assign uncertainty safeguards only to conditional assertions. This is 
not so, as we shall now see. ‘Throughout this section we have provisionally presumed a distinc- 
tion between conditional and unconditional assertions in terms of the uses to which we put them. 
This is clear-cut in the sense that : (a) any statement worthy to rank in the corpus of scientific 
knowledge is one which we can rightly describe as unconditional in the sense elsewhere defined 
(b) statements of the conditional sort suffice as a basis for administrative decision. It is none the 
less possible to formulate rules of decision leading to unconditional statements of a sort rarely, 
if ever, relevant to the domain of research in pure science and no more useful to the administrator 
because more comprehensive in scope than a corresponding statement expressed in the more 
restricted form. Further consideration of the Drosophila model of 20.03 will make this clear. 

In 20.03 we set up two hypotheses H, that p = 4 = p, and H, that p = =p), p being 
the probability that any offspring of a particular mother will be female. If we make the rule 
to reject H, if x >a + 4 for the r-fold sample and denote by L.. the probability that it will 
contain x females if p = pa, we may assign as the risk («) of rejection when H, is true: 

LP 
A == x ho 
a =(a +1) 
Similarly we may adopt H, as our null hypothesis and make the rule to reject it if x < b+ 4. 
The corresponding conditional risk (8) of rejection is then 


x=b 
B == ba i PES 
z=0 


In either case, we attach an uncertainty safeguard (« or £) to a statement which is conditional 
in the sense that it refers to a risk we take of being wrong if a particular hypothesis is correct. 
Unless a = b the simultaneous application of the two tests will not necessarily lead to a decision 
in favour of either hypothesis ; but we can formulate a composite rule which must do so in the 
form: reject H, if x > k + 4 and reject H, if x < (k + 4); and we may be able to choose 
k so that a ~ y œf, if r is fairly large. This leads to a conditional assertion which assigns y as 
the risk that we shall reject either hypothesis if true ; but we cannot assign an acceptable safe- 
guard to any unconditional assertion about the outcome unless the hypotheses so stated constitute 
an exhaustive and exclusive set. We can make a more comprehensive type of statement if we 
restate our hypotheses in the form H, that p < p, and H, that p > p,; and may still guarantee 
the termination of the test in a firm decision, if we follow the same composite rule of rejecting 
H, when x > (k + $) and rejecting H, when x < (k + 4). We then define fe, and L,., in 
terms of p, and p, as before; and fix & so that 


v=r rt=k 
> Eta E L, b 
%=(k+1) t= 


Any value p < p, then makes the conditional risk of rejecting H, in its new form less than 
o. ; and any value of p > p, makes the conditional risk of rejecting H, in its new form less than g, 
which we may assign at any acceptable level if free to prescribe the sample size (r) in advance, 
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as in 20.01. The rule itself limits our positive statements to the alternatives p > p, and p < py. 
It prohibits any statement about an interval in which p lies, e.g. any statement of the form 
Pa <P <p». To any assertion it does entitle us to make we may attach an uncertainty safeguard 
Ps. <a. Since this sets a limit to the probability assignable to any false statement we may 
make, we are entitled to say that P; < « unconditionally defines the uncertainty safeguard of the 
entire class of statements which the rule subsumes ; but we can state this only because the rule 
subsumes no simultaneous statement concerning the relation of p to both p, and py. 

If we know that the Drosophila culture contains several different genotypes to which we 
can assign values of p, we can meaningfully postulate prior probabilities referable to existent 
populations at risk to formalise the unconditional character of the final statement which the rule 
endorses. We must do so with due regard to its content, viz. the probability of wrongly rejecting 
the hypothesis pa < p < p, is zero, since the rule does not allow us to reject it. We may then 
set out the argument in terms of the following symbols, e being positive : 


Hypothesis Prior Probability a o F 
Lo <>, i i i Can ; ; e Porr adà 
p= p, ss. O [eg oe 
ee p< pp, . et ae : verde gag == OD 
4. p =p, ; ; ; ieee ee ; aoe Se 
EP >P; , ; ; eE a a š E ee fe kT 


These hypotheses constitute an exclusive set among which we can accept only one. Hence the 
addition rule applies, and our unconditional uncertainty safeguard is 


Pe= P,P; FP Pro + P;3.P;.3+ Ps. Ps... + P;.P,.s 
= Pia — €) +P..a + P. & + Pala — es) 
= (1 — Px — P,. e, —P,. €; 
e ee 


The prescription of such a rule presupposes two target values of p which we can readily 
conceive in relation to standards of quality and to costing limits in an executive set-up, but the 
unconditional form the terminal statement assumes when we formulate a rule in this way embodies 
no relevant information other than the content of two types of conditional assertion. What the 
choice of the single rejection-acceptance criterion k accomplishes is that the inspection plan 
itself achieves its task, i.e. the test must lead to the decision to reject either H, or H,. In 
fact, both hypotheses may be wrong ; and the unconditional form of the assertion is realisable 
only because the test can never lead to a corresponding assertion, i.e. a statement to the effect 
P a < P pa P ps 

If we operate within the framework of a single hypothesis stated in the form p < p, or 
P > Po, and have defined our rejection criterion so that P, < « is the probability of rejecting it 
when true, we are free to limit our verdicts, as R. A. Fisher does indeed prescribe, to the alter- 
natives : hypothesis false and hypothesis unproven. In the sense that P, < a is then the pro- 
bability of erroneously making an allowably decisive assertion, we might admittedly say that 
P; < a is the unconditional safeguard of our test procedure. We then evade the Neyman- 
Pearson error of the second kind by exposing ourselves to situations in which the overwhelming 
majority of our decisions will assign the verdict unproven to a false null hypothesis. We can 
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indeed avoid doing so only by prescribing sample size with due regard to the Neyman-Pearson 
concept of test power ; but any attempt to rehabilitate the Yule-Fisher significance test on such 
terms undermines previous claims concerning reliability of inference referable to small samples. 
We shall examine the implications of the last statement more fully in 20.08. 


20.05 A SEQUENTIAL TEST PROCEDURE i 


We are now in a position to reinterpret the issue raised at the end of 20.02, i.e. the interpretation 
of Sequential Ratio limits ; and shall do so with special reference to the type of alternative test 
procedure called the double dichotomy due to Wald. It will help us to understand its rationale 
if we first briefly examine a unique null hypothesis test which employs the same principle. 
Wald’s non-sequential so-called exact test presupposes that we can pair off unit samples from 
each of two universes which may or may not have the same composition w.r.t. a particular 
criterion of classification. For instance, we may take cards with replacement one at a time 
from each of two packs I and II, not necessarily complete, setting out the results as below: 


From I From II No. of pairs |. Total 
Concordances Red Red 30 
Black Black 16 = 
Discordances Red Black 12 


Red 24 


The null hypothesis is that the two packs I and II are identical w.r.t. colour composition. 
If so, the two possible types of discordant pairs must occur with equal frequency in the long 
run. If pack II contains a higher proportion of red cards, pairs of types B — R should occur 
more often than pairs of type R — B. Accordingly, we disregard the concordant pairs, adopting 
as our null hypothesis that the unit trial probabilities p of getting B — Rand q of getting R — B 
are equal, i.e. p =%4 = gq. ‘The expected value for a sample of 36 is therefore 18 for each type. 
If p = 4 = q the variance of the raw-score distribution of the 36-fold sample of discordant 
pairs is 36 X $ X = 9 so that e = 3. Now the deviation of the observed number of B — R 
pairs from its expected value is 24 — 18 =6. To evaluate the probability that the deviation 
will be as great as + 6 we require the area of the histogram in the range from 0 to 23. For 
samples of 16 or over the normal curve of unit variance gives a good quadrature approximation 
(a real fit) for the terms of the binomial (4 + 4)" if we make the half interval correction. Thus 
our concern is with the area of the normal curve of unit variance from — co to 


(23) — 18) 
3 


The table of the normal integral shows that the area so defined is 0-966. If the null hypothesis 
is true the odds are therefore about 30 : 1 against getting a score of 24 or more B — R pairs in 
a 36-fold sample of discordant pairs. If we wish to test whether m discordant pairs of a given 
type exceed expectation ($7) significantly, we may cite the exact probability of getting a score of 
m or more as 


== 1:89, 


x 


x=r | =r 1 
> HQ)" = > = 
id c= 


1 xl (r — x)! 
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With due regard to the half interval correction, the appropriate critical ratio (standard score) is 
ie y lor 

= o ar 


If r is larger than 16 we may safely use the figure given in the normal table for the area from 
— œ to c,, to evaluate the probability that the score will be less than m if the null hypothesis 
is true. 


Double Dichotomy Sequential Test. Wald’s non-sequential test last described is a test 
which involves a unique null hypothesis. A sequential test based on the same method of classi- 
fying the data, i.e. rejection of all concordant pairs, presupposes a hypothesis (p, = Mpa = 3M) 
alternative to the null hypothesis (pa = 4). If we designate discordant pairs of two series 
A and B respectively as successes ( — +) and failures ( + — ) our alternative hypothesis is 
then that the probability of success so defined is m. Since we can label our pairs at choice 
we may formulate the alternative on the assumption that successes are more common than 
failures, ie. 1 <m <2. 

If x is the number of pairs labelled as successes, we may then proceed to define a 
sequential ratio in the usual way for an 7-fold sequence of pairs as 


PORRES 1 


gee” AA 


The reader will recall the prescription for any S.R. test is as follows. We continue to 
enlarge the sample until 


either S, > A in which event we accept the null hypothesis ; 
or S, < B in which event we reject the null hypothesis. 


Let us now recall the pattern of an S.R. test as set out in 20.02. Though we define the 
criteria of rejection in terms of the numerical value the ratio S, reaches, any prescribed value 
S, = A referable to the completed r-fold sample presupposes that the corresponding raw score 
has not exceeded the particular value « = a, and any prescribed value S, = B presupposes that 
the corresponding raw score has attained a particular value x = b. In short, the test prescription 
implies that we terminate the test for the value of r at which S, > A or S, < B by 


accepting the null hypothesis if « < a 
rejecting the null hypothesis if x > b. 
When x = a and S, = A, our alternative hypotheses prescribe 
ada" 1 
se = ———__ =A . i ‘ i g 


THE mE- 


When x = b and S, = B, our alternative hypotheses prescribe 
b ,r—b 1 


S _ Pada 


‘pe wa ey 


We may make the alternative hypotheses more comprehensive with a view to an unconditional 
form of statement en rapport with the argument of pp. 878-9 above, if we put pa < Land p, > 3m 
without affecting the choice of a and b implicit in the rejection criteria S, = A and S, = B. 
For a given sample size (r) we may likewise define a and b in terms of the conditional risk 


(11) 
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(P;.q < a) of erroneously rejecting and the conditional risk (P;., <8) of erroneously accepting 
the null hypothesis, i.e. rejecting the alternative when true. Thus we may write 


A = > toh, ma 28 = > rofe . . . (ii) 
x= (a + 3) =0 


When b, = 4 and p, = 4m 


L=y a = (b —3) 
a= 0" > ca 6 =O > Failte ow: = 
x =(a +4) x=0 
x= (a +4) i =r 
(L-0)=2" 2 ca gs 2 ARA 
2=0 x = (b — 3) 


Since g and £ suffice to define x = a and x = b for the value of r at which the test terminates, 
we may hope to fix S, = A and S, = B to ensure that the unconditional uncertainty safeguard 
(P;) for a decision one way or the other exceeds preassigned values of neither « nor 8. The 
fact that the test may terminate when p, < p < p, is then irrelevant, since our verdicts are of 
only two sorts, p < pa or p > py. The clue to the relation we seek is that: (i) £ and (1 — «) 
are each referable to advancing totals of a binomial summation, i.e. from 0 up to b or from 0 up 
to a; (il) « and (1 — £) are each referable to receding totals, i.e. from r down to (a + 1) or 
from r down to (b + 1). We shall now state, but without proof at this stage, a rule connecting 
the sequential ratio with the two classes of totals : (a) the ratio of corresponding advancing total 
frequencies is greater than or equal to the terminal sequential ratio (p27 ” — p?qf~*) of the 
sequence ; (b) the ratio of corresponding receding total frequencies is always less than or equal 
to the initial sequential ratio of the sequence. The following numerical example for the 4-fold 
sample distribution when p, = 4 and p, = ¿ illustrates the meaning of this assertion, Ya and y, 
being relative frequencies on the same scale (4*) : 


e. U 1 2 3 4 

Vip Say: : 16 64 96 64 16 

Yo Pr = 3) . . . 1 12 54 108 81 

Advancing Sum 16 80 176 240 256 

Totals 1 13 67 175 236 

Receding Sum 256 240 176 80 16 

Totals 256 255 243 189 81 

Ratio of Advancing Totals 16 6-2 2-6 1-4 1 
Ratio of Receding Totals 1 0-94 0-72 0-42 0-20 
Sequential Ratio . ; 16 53 1-8 0:59 0:20 


From (1) and (11) above we therefore see that 
l—«u o 
> AA < . . . . e ] 
B A and eae B (iv) 


To make the meaning of these inequalities clear, let us first notice that diminishing either « or £ 
makes the left hand ratio larger and the right hand ratio smaller, e.g. 
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a = 0-10 a = 0:05 a = 0:05 a = 0-025 
B = 0-05 =f B = 0-025 =p 
a 18 19 38 39 
f 
E. Z 1 2 1 
=p > 19 19 39 39 


de : l 
Let us now fix A = 20 as our acceptance criterion for the hypothesis p < p, and B = ~~ as our 


20 
alternative criterion for acceptance of the hypothesis p > p,. In accordance with (iv), we have 
then said that 
5 ~>20 and <= 
We see from the above that this implies that « and 6 must then both be less than 0-05. By 
- tabulating (1 — «)6-1 and a(1 — fB)* for descending values of « and descending values of f we 
can thus see what sequential ratio limits are consistent with assigning our conditional risks at 
prescribed levels Pj., <a and Pj.» < $. Since the test permits us to make only two sorts 
of statement the overall risk of erroneous decision is P; < « if a > B and P; < B if B > a. 

A proof of the rule stated w.r.t. ratio of advancing and receding totals for 2 binomial series 
is elementary ; and it will suffice to consider the case which arises when pa = 4 and p, = 4m 
in which 1 < m < 2. In this case, we may put m = (1 + e) in which e is positive, so that 


m eS a 


Damn” Le 


Ri Pe A 


Our acceptance criterion is 
1 
2 m(Q—my" 


The ratio of the frequencies of the sum of scores less than or equal to a is 


zr=40 
ae 
qu z=0 
= à 
x Y—X 
> Mo) MAL — m) 
x=0 


We require to prove that R, >S, and if R, = K.S, that K >1. This implies that 


Ra 


i 
|l 
US 


Tia) MA — my * 


& & 
1 IM 
© 


a 

8 
ll 

a 


a —a a 
>. Tia) -m(2 — m) fia: R 
: _ «=0 __ «=0 
A ~ g=a 
x —o £ 
g= 0 g=0 


Now every value of x in the denominator of the expression on the right is less than or equal 
to a. Since then k > 1, corresponding terms in the numerator are greater than those of the 
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denominator, whence K >1. The reader should be able to complete the proof for receding 
totals in the same way. The generalisation of the rule for p, > pa, when p, is not equal to 0-5 
is trivial, since the relevant ratios are then expressible in terms of p, = jm, and p, = 3m,, 
in which m, > m. 

Let us now return to our test prescription, vzz. : 

(i) continue to enlarge the sample if d > S, >B; 

(1) when S, >A terminate the test by accepting the null hypothesis ; 

(11) when S, < B terminate the test by rejecting it. 


We have elsewhere (20.01 and 20.04) offered alternative interpretations of A and B. If we 
set a = 0-05 = £ our choice will be A = 20 and B = 0-05. The procedure is then reducible 
to rule of thumb. If S, >A 


1 
x T—L < ; 
m*(2 — m) — 


. x log m + (r — x) log (2 — m) < — log A, 
log A +r log (2 — m) 


S log (2 — m) — log m (v) 
70 . 
60 
S>20 implies æ < log 20 -rllog 3- log 2)_ 
| -log4 
50 log2 -log 
ie. 2c < !3Ol-(0:-176l)r - < 
=): | x) O 
40 O-30I| ene? K i 
x | SS qo” 
O : ac ow 
O eS ON? 
n 30 \\ 
TEST ENDS —% 4 QW 
Ca Ss Yo implies x >l0gO-05-rllog3-log 2) 
log2-loq4 
ie. x > -0:699-(O:176Nr 
i | "OSON 
: ! 
35 IOO 


Fic. 136. The Sequential Ratio Test Prescription for Double Dichotomies. ‘The limiting ratios are 
9 (9:1) and 0-1 (1:9) as in the example. 


We can make a graph (Fig. 136) of values the linear expression on the right of (v) assumes 
for different values of r terminating the test in acceptance when x falls below the line. Similarly, 
the condition S, < B implies 


1 
m*(Q neve my” > P 
log B + r log (2 — m) 


ree log (2 — m) — log m (vi) 
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The example on p. 855 will suffice to illustrate the use of (v) and (vi), since we have made 
Pa =% and p, =$ = $p. so that m= $. For the 8-fold sample we there cited x < 1 for 
S, >9 and x >7 for S, <0-1. Thus (v) and (vi) become 
108 9 + 8 log 2 — 8 log 3 


S&S 


log 2 — log 4 
_ Slog 2 — 8 log 3 — log 9 
log 2 — log 4 


This gives as the lower value x < (0:4450 — 0:3011) = 1-4 so that only scores of x = 0 
or 1 satisfy the acceptance criterion and x > (2:3726 — 0:3011) ~ 7-8 so that only the score 
x = 8 satisfies the rejection criterion as we have already shown directly. 

If we invoke a sequential test of this type, we accept the onus of assigning to the parameter m 
a meaningful numerical value. In many biological enquiries, such as the clinical trial, we have 
little to suggest an appropriate figure. To get this dilemma into focus, let us recall the urn model 
at the beginning of this section. We shall denote by p, the proportion of red balls in Urn I 
and by pz the proportion of red balls in Urn II. The complete set-up is then as follows : 


Concordances (+ +) Pips (——) (1l—p,) (1 — p.); 
Discordances (+ —) p,(1 — pe) (— +) (1 — ppe 
Our null hypothesis is that 
pi(l — Po) iro ARS AR 
p(l — pe) + po(1 — P) Pill — pa) + poll — pr) 


This is necessarily true if Urn II contains the same proportion of red balls as does Urn I, Le. 
PD, = ə The alternative hypothesis implies that 


io pl — P,) D Pz — Pipe 
a PAOTR — pe) po — 21Po + h 
. M(Pz — 2P1Pa + P1) = 22 — 2APipe ; 
mp, = (2 — m)p. + 2p,(m — 1)p»; 


pe mp, 
--2—m-+2p,(m — 1) 


bole 


P2 (vii) 
If we can cite a reliable figure for p, referable to the first urn, we can now interpret m in terms 
of operational advantage, i.e. pp — p,. Otherwise, the outcome of the test has little bearing 
on its presumptive aim. 


20.06 ESTIMATION AND CONFIDENCE 


The need to distinguish between two types of statistical inference as conditional and unconditional 
arises from two domains in which we can apply statistical principles in everyday life, namely : 
(a) the regulation of affairs; (b) the process of discovery. In commerce, manufacture and 
social administration we need rules of conduct to safeguard us against known risks. Our concern 
is then less with the truth of a principle than with the consequences to ourselves when it happens 
to be true or false, as the case may be. We do not therefore invoke statistical reasoning to 
justify our confidence in the truth of a particular hypothesis. All we need to know is how 
often a particular course of action will expose us to the penalties of rejecting its truth. This 
essentially conditional conception of the use of statistics has a long history in connexion with 
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the rise of insurance, and is unexceptionable in its right place; but it has little relevance to 
what men of science have hitherto regarded as the proper goal of scientific research. If the 
legitimate scope of statistical reasoning has no concern with impersonal judgments about the 
truth of hypotheses, we must either dismiss its commonly asserted claim to be an essential 
component of scientific interpretation or abandon the traditional ethic of the scientific worker. 

Our examination of the use of test procedures in 20.03-20.04 has led us to the following 
conclusions : : 


(i) the issues raised in the construction of the balance sheet of Thomas Bayes refer to 
situations in which the end in view is to pass judgment on the truth of one or other 
of a set of hypotheses referable to existent populations at risk ; 

(ii) since the terms of reference of conditional statistical inference, as here defined, exclude 
judgments of this sort, the balance sheet of Bayes has no bearing on the admitted 
usefulness of decision tests within the legitimate province of conditional assertions ; 

(iii) if circumstances entitle us to limit the range of admissible hypotheses which invite 
a verdict, it may be possible to devise decision tests which set an acceptable limit 
on the probability of a false choice without assuming that more than one such hypothesis 
is indeed referable to an existent population at risk ; 

(iv) since such circumstances will rarely arise in the conduct of research, the scope of 
unconditional statistical inference is very restricted, if exclusively reliant on test 
procedure ; 

(v) if we hold that unconditional statistical inference is an important feature of scientific 
interpretation, we can therefore justify its claims as such by invoking an alternative 
procedure. 


In statistical theory the term interval estimation has lately acquired a special meaning to denote 
such an alternative. We can define it naively as in 20.00 to emphasise what distinguishes the 
end in view from that of the decision test, or in a more sophisticated way, as later, to exhibit 
the pattern common to each procedure. At the outset, however, it is essential to be clear about 
what we do not mean by estimation in this context. Emphatically, we do not here mean, as 
is implicit in the all too common expression the best estimate, a procedure which assigns some 
unique value to a parameter of a universe, e.g. the proportion of cards of a particular denomina- 
tion in a pack or the true mean score of the toss of a die. 

The use of the epithet best (and the preceding definite article) qualifying the word estimate 
is so widely current as to justify a brief digression, since its literal semantic implications are 
indeed wholly inconsistent with a modern attitude to the sort of statements which estimation 
procedure undertakes to justify. In short, we approach the problem of estimation with an 
emotional block at the outset, unless we realise that the end in view is not to cite some single 
best figure. A so-called point-estimate is not one to which we can assign an acceptable uncer- 
tainty safeguard within the domain of unconditional statistical inference. In different contexts, 
writers on statistics attach the adjective best to a unique sample estimate to signify that it is: 
(a) unbiased ; (b) most efficient; (c) sufficient. The first merely means that we arrive at it 
by a process which would lead us to the right answer if we repeated it on similar samples suffi- 
ciently often, an assertion which does not get us very far if we have only one sample at our 
disposal. The second signifies that the method we use to derive it would ensure the least 
uncertainty about the range in which it lies; but does not imply a precise specification of the 
location of the latter. The third signifies that it embodies all the relevant information the 
sample supplies ; but tells us nothing about what conclusions the relevant information entitles 
us to draw. 
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With these distinctions in mind let us consider a conundrum beloved by pedagogues of 
the more prosy sort. If I have tossed ten times a penny which came down heads upwards every 
time, what can I conclude about the probability that it will come down heads next time? 
If we denote the single spin probability of heads by p, the best estimate of p in each sense of 
the term as used above is then p = 1, since (i) the observed proportionate sample score of a 
binomial variate is an unbiased estimate of the unique parameter p; (1) the variance of the 
binomial distribution is zero when p = 1 ; (iii) the parameter p suffices to define the distribution 
for an assigned sample size. My so-called best estimate will thus signify that the penny bears 
a head on both faces.* 

The modern viewpoint associated with the terms confidence and interval estimation repu- 
diates the undertaking to make any such statement. What it does undertake is to specify a 
range of values (confidence interval) within which a parameter lies. In conformity with our 
initial definition of statistical inference (20.00) such a specification presupposes an uncertainty 
safeguard. In other words, we say in effect: (a) the appropriate answer (i.e. limits assigned 
to the range) will not be invariably true ; but it will be right nine times out of ten or ninety-nine 
times out of a hundred or with whatever corresponding frequency you care to assign; (b) how 
precise an answer I can give you (i.e. how narrow the prescribed limits of the range) will depend 
on how much liability to error you are willing to condone; (c) when you have made up your 
own mind about how much fallibility you will concede to me in return for how much I may 
legitimately claim for a more definite assertion, we can get down to business. In terms of what 
we now call confidence intervals the argument proceeds as a public symposium in which the 
possibility of concord is contingent on an accepted framework of precisely specified fallibility. 

The development of the theory is largely due to J. Neyman. In retrospect, and for reasons 
given in 16.05, Gosset's pioneer paper on the t-distribution has a special interest in connexion 
with its beginnings, because it opened the door to an exact method of estimation of the mean 
of a normal universe; and it is reasonable to suppose that a premonition of its relevance from 
that viewpoint motivated Fisher’s appreciation of its importance before it gained recognition. 
Credit for what is seemingly the earliest explicit statement of the common sense of the confidence 
approach in the taxonomic domain is due to Wilson (1927). Since Wilson’s contribution has 
received very little recognition, it will not be out of place to quote his words : 

In 1927 I called attention to the fact that many statements about probability are highly 
elliptical and illustrated the matter by the simple case of a point-binomial universe with unknown 
probability p and observed value p, in some sample. Using the admittedly rough estimate of 
probability based on the standard deviation one ordinarily writes 


y= AV Dogo/n LPR AV Podoln 
and states that the probability that the true value p in the universe lies between the limits given 
may be had from a probability-integral table entered with a normal deviation of A units. I urged 


* Kendall’s (1946) remarks (Advanced Theory of Statistics, Vol. II, p. 2) are eminently quotable in this context : 

“ It will clarify our ideas considerably if we draw a distinction between the method or rule of estimation, 
which, following Pitman, we shall call an Estimator, and the value to which it gives rise in particular cases, 
the Estimate. The distinction is the same as that between a function f(x), regarded as defined for a range 
of the variable x, and the particular value which the function assumes, say f(a), for a specified value of x equal 
to a. Our problem is not to find estimates, but to find Estimators. We do not reject a method because it 
gives a bad result in a particular case (in the sense that the estimate differs materially from the true value). 
We should only reject it if it gave bad results in the long run, that is to say, if the population of possible values 
of the estimator were seriously discrepant with the value of 0. The merit of the estimator is judged by the 
population of estimates to which it gives rise. It is itself a random variable and has a distribution to which 
we shall frequently have occasion to refer.” 

The reader will note that the best estimator so defined is not a number, but a rule which would lead us to find a 
number in the most economical and reliable way, if consistently followed. 
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that a better procedure would be to use for the standard deviation the value pg/n obtained from 
the unknown p of the universe which leads to 


Pot t/2 Vogt +4 Po + 12 A V Pogot + 2/4 
Ta 14t ltt ltt 
(Proc. Nat. Acad. Sct., Vol. 42, 1942) 


We have touched on the elements of confidence theory in Chapter 5 of Vol. I; but it will 
not be redundant to retrace our steps, if only to emphasise a nicety of definition. In conformity 
with the usage of the founding fathers of the theory of probability, many statisticians prefer to 
restrict the term probability in a practical sense to denote the long-run frequency of an external 
occurrence in contradistinction to the long-run frequency of a correct judgment concerning 
such an occurrence. If so confidence is not probability. From a behaviouristic viewpoint, 
however, there is no obvious objection to the use of the same term for the frequency of events 
which respectively do or do not involve human behaviour at the verbal level, if we make the 
distinction explicit when appropriate, recognising in what situations it is so. When we take 
a sample from a homogeneous universe, the parameter (p) concerning which we seek an estimate 
has a unique value, the probability of which in any meaningful sense is unity. Thus we can 
speak with propriety of p as the probability that it actually lies within those limits if, and only 
if p has one of two values, viz.: p = 0 or p = 1; but we may state with propriety, and without 
any such restriction, that p is the probability of a correct assertion about limits between which 
it lies. The probability of making a correct assertion by consistent application of a rule, and 
that is what we here mean by confidence, may indeed have any value between 0 and 1 in conformity 
with the way in which we define the limits our assertion sets to the values p may take. 

It will be easy to get into focus the formal implications of this distinction, if we examine 
artificial situations for which we can construct a model game-of-chance. Those which follow 
in this section prescribe a homogeneous universe concerning which we seek to estimate some 
single definitive parameter. 

Model la. We shall conceive that a lottery wheel has 1024 sectors labelled with scores. 
x, (x + 1), (x + 2), (x +3)... (x +9), (x + 10) respectively allocated to 1, 10, 45, 120, 210, 
252, 210, 120, 45, 10, 1 sectors. So much we know; but we do not know the numerical value 
of x itself. At each spin we record as our score that of the sector opposite a fixed pointer. We 
now suppose that we spin the wheel 40 times and record the mean score (M,,) of the 40-fold 
sample as 6-3. Our problem is to define what we can legitimately assert about x. 

The long-run mean value (M) of the score of any sample is, of course, (x + 5); and the 
terms of (3 + 5)'” define the u.s.d. of the universe with variance o? = 2-5, whence that of the 
distribution of the 40-fold sample mean is 


"40 
Thus o,, = 0:25; and the error involved in a normal quadrature for the distribution of the 
sample means is trivial. We can thus say that 


(a) the mean (M,) of 2:5 per cent. of all samples in the long run will exceed 
M +-2¢,,= M+ 05; 
(b) the mean of 2-5 per cent. of all samples in the long run will be less than 
M — 26m = M — 0:5; 
(c) the mean of 95 per cent. of all samples will lie in the range M + 2om = M + 0-5. 
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To say of 2:5 per cent. of all samples that M, > M + 2o,, is to say of 2:5 per cent. of all 
samples that M, — 20m > M, in which event M < M, —0:5. To say of 2-5 per cent. of all 
samples that M, < M — 26. is to say of 2:5 per cent. of all samples that M, + 2o,, < M, 
in which event M > M, + 20m. The assertion that the sample value (M,) lies within the range 
M + 20 will be true of 95 per cent. of all samples, i.e. the probability that it is true is 0-95. 
Since such an assertion is formally identical with the statement that M itself lies within the 
range M, + Zom, we can assign a probability of 0-95 (95 per cent. confidence level) to the truth 
of the assertion that M lies within the range 6-3 + 0-5, i.e. from 5-8 to 6-8 inclusive. Since 
M = x + 5 by definition, we can say with equal confidence that x itself lies within the range 


0-8 to 1-8 inclusive, assigning 1 as the correct value (at the 95 per cent. confidence level), if x 
is an integer. 


A CONFIDENCE MODEL 


9-5 
2:5% sample values of Mx lie 
above the line M=M+2o,,. 
Of all such samples it is true that: 


8-5 Ma > M+20,,. 
| «For all values of M above the 


25% sample values of Ma lie 
below the line M>=M-20,,: 
Of all such samples it is true that: 


M <= M-20 
6: EEEE OEA ImIJTZTz—= >> For all values of M in this region 
FREE + 
M763 po triste M > Ma +2 05. 
ss [ii 95% sample values of M lie on 
EERE, =n or between the lines 
CAI ror AA i 
a Rc My¿=M it 205. 
AS AA, ESE 4 A 2 
SR RNR For all values of M in this region 
RARA, ARAU SRT 
| AEREA SEES M,*+20, > M 2 Mx" 20m, 
ELIAS SES 
Ee EAS 
AA IO SEAS 
IE EEE 
AAA AAA 


M= 5 6 8 9 
f= O l 3 a 


i 
i 
M=63-20,, M=63+20,, 
= 58 = 6'8 
x=0:8 x=1-8 


Fic. 137, The Region of Confidence for the lottery wheel model of p. 888. 
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We can set out the above reasoning in tabular form thus: | 


Event Probability of Probability of 
its occurrence Equivalent assertion its truth 
M, > M + 20m 0-025 M < M, — 26m 0-025 
M, < M — 20,, 0:025 M > M, + 2c, 0-025 
M — 20m <M, <M + 20m 0-95 M,+ 20, > M > M, — 26m 0-95 


Fig. 137 exhibits the argument based on our lottery wheel model within the range of values 
5 <M <Q9and0 <x <4. For any value of M we deny the occurrence of all values of M, 
greater than M + 2o,, or less than M — 20,, with a probability of erroneous assertion approxim- 
ately equal to 0-05. Thus 95 per cent. of all sample values of M, will lie within the two parallel 
lines M, = M+ 2c,, and M, = M — 20m. There will correspond to any observed value 
of M, two values of M where the line through M, parallel to the abscissa cuts these two lines. 
These two values will define the range of M consistent with the probability of error assigned 
to our denial of the limits of admissible values of M,. 

In one respect, the foregoing model is highly artificial, i.e. we know in advance the numerical 
value of the variance (c) of the u.s.d. and hence that of om. When sampling from a putatively 
normal universe we rarely, if ever, have such knowledge ; but we do know the distribution of 
the ratio (IZ, — M) to the unbiased sample estimate (sm) of om. Hence as explained in 16.09 
we can get from the ¢-table upper and lower limits for M consistent with any assigned probability 
of erroneous statement within the framework of repeated application of the rule. We can do 
the same for o? itself by recourse to tables of the Chi-Square distribution as explained in 16.03. 

Model Ib. We now suppose that our lottery wheel has 100 sectors on each of which the 
number of pips is either 0 or 1. We do not know the number (100g) of sectors which carry 
no pips or of sectors 100p = 100(1 — q) which carry one pip. We spin it 100 times and record 
the mean score of the 100-fold spin as 0.62. Our problem is to define confidence limits of 
p, the proportion of sectors which carry one pip. We are here sampling in an infinite two-class 
universe, and successive terms of (q + p)!” define the frequencies of the observed proportionate 
(mean) score p, = 0, 0-01, 0-02, 0-03, . . . 0:99, 1:0. The unknown variance of the distribution 
of p, is given by 

r a pl ee p) 
si eee 

Throughout the range of prescribed values other than p = 0 or p = 1 the distribution 
of the observed proportionate score will be approximately normal. The range p, = p + 20, 
will therefore define the 95 per cent. confidence level well enough for heuristic purposes. Since 
o, depends on p, being zero when p = 0 or p = 1, the two boundaries of acceptable values 
of p, will not be parallel straight lines as in Fig. 137. They will meet at p = 0 and p= 1, 
the upper being concave downwards, the lower being concave upwards. The corresponding 
acceptable range of p values for any observed value of p, is obtainable graphically (Fig. 138), 
as before, by drawing a horizontal line to parallel the abscissa ; but each limit is subsumed by 
the two roots of the quadratic 


Pq 
(Po — pY = 40, =a 


If the observed mean value is 0:62 this becomes 
25(0-62 — p)? = p(1 — $), 
. 26p? — 32p + 9-61 = Q, 
“paos or VIE 
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CONFIDENCE IN THE DOMAIN OF THE 
2-CLASS UNIVERSE 


O”. be «re wo. 
p F 


Observed Proportionate Score (p,) 


Ot O2 O3 ö4 O5 06 07 O8 09 I-O 


True value of p 


P= (rpo+2)+J (rpo+2) = r(r+4)po 


r+4 
œ 0-66 or 0-83 


Fic. 138. Graphical Interpretation of the Quadratic Solution of Confidence Limits in the Taxonomic 
domain. 


At the 2c confidence level we shall therefore say that our lottery wheel has no more than 71 
and no less than 52 sectors carrying one pip. ‘There is no need to generalise this case, already 
dealt with in Chapter 5 of Vol. I. Four points call for comment : 


(1) the foregoing method is valid only if we can assume that the normal curves give a 
good quadrature for the binomial distribution, the relevant condition being as stated 
in 14.07; 


(11) even so, we can assign the appropriate confidence level correctly only if we pay attention 
to the half interval correction ; 
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(11) if the size of the sample is small we can define, for any value of p, limits which exclude 
at least 2-5 per cent. (or other agreed figure) at either end of the range by recourse 
to the tables of the binomial ; * 


(iv). if the size of the sample is large, a very small observed value of p, may suggest that 
the Poisson distribution gives a better summation of terms than the normal integral. 


Model Ic. It limits our horizon unduly, if we confine our interpretation of confidence 
limits to situations in which we can assume without appreciable error that our score distribution 
is approximately normal and the confidence interval itself expressible in terms of its variance. 
The latter has no relevance when the universe is rectangular; and we may therefore deepen 
our insight into the logic of confidence theory, if we now lay aside any preoccupations with 
the normal distribution. As an elementary example of the confidence approach to estimation 
in the rectangular universe, we may consider the following model. A lottery wheel has s sectors 
with consecutive scores from 1 to s, so that the proportion of sectors whose score value (x) exceeds 
m (< s) is (s — m) — s. We shall suppose that we spin it once and record x. Our first problem 
will be: what can we legitimately say about s ? 

In the treatment of the foregoing model we have side-stepped a limitation of interval estima- 
tion in the domain of discrete score values by assuming a good enough normal fit. Unless we 
postulate a continuous distribution we cannot in fact assign an uncertainty safeguard (P; = «) 
or confidence level (1 — P,) = (1 — a) to an admissible range of score values. The best we 
can assert is a statement of the form P; < a or P; < a, as when we use tables of the binomial 
in the situation of Model I (b). One reason for this is that we can assign more than one value 
to m consistent with a fixed value P; = a for the rule to disregard all samples if x >m. If the 
score x is an integer, e.g. k or (k + 1), we can postulate an infinitude of values to which we can 
assign the probability « that x > m in the range k <m <k + 1. 

It will be convenient to write P(x > k) for the probability that x exceeds k and P(x > k) 
= P(x > k — 1) for the probability that x is not less thank. If k is an integer there are (s — k) 
score values in the range x > k and (s — k + 1) in the range x >k, whence 
s—kRk-+1 


and P(x Sk) = E? A ; Ee 


He >R)= 


Ifk + 1 >m >k so that mis either an improper fraction in the interval between k and (k +1) 
or is the integer k itself, we may write m = (k + e) and k = (m — - €) for values of e in the range 
O<e<1. When e = 0, we may write 
s—m->l 

; 


P(x >m) = and P(x >m) = 


When e >0 


Pie > m2) = Fie > p= EE, 


| ke s—=m>+l>+e 
$ S 


Pin =mi Pi >= 
(i) 


rie m) > == and P(x > m) >" 


* Tables of the Binomial Probability Distribution, 1950. National Bureau of Standards, Applied Mathematics 
Series, 6, Washington; Clopper and Pearson (1934), Biometrika, Vol. 26. 
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We may subsume both equations (1) and (11) to cover the possibility that m may or may not 
be a whole number in the expressions 


s—m s—m+1 


P(x > m) > and P(x >m) > 


s 
Let us now set m = «s, so that 
Rule (i): P(x > as) >l—a; 
Rule (ii): P(x >as) >(1—a«) + 1> (1 — a). 
The proportion of all samples whose score x exceeds «s is thus no less than 100(1 — a) per cent. ; 


and the proportion of all samples whose score x is not less than as is greater than 100(1 — «) 
per cent. We may set out the implications of the foregoing statements as below: 


ie Probability of Equivalent Probability of Probability of 
ý its occurrence assertion its truth its falsehood 
x> as Si =a [=> PSU Py <a 
) x 
x pas > (1 — 2) = P, > (1 —a) A 


We may express this by saying that our uncertainty safeguard for the assertion that s is less 
than 20x does not exceed 5 per cent. and our uncertainty safeguard for the assertion that s is 
at least 20x is less than 5 per cent. On the basis of observations of a single spin with scores 
respectively x = 5 and x = 10, our assertions would thus take the following form, if we deem 
P; <a as an acceptable level of uncertainty : 


A gee s < 200 < 0-05 
Po. a s < 201 < 0-05 


To say that s < 100 in this context is to say that the upper confidence limit is 99. In terms of 
confidence limits we therefore write the above as 


Upper Confidence Limit of s 


x=5 x=10 P, 
o 99 199 < 0-05 
Ss er 100 200 < 0-05 


Why we cannot express our confidence level in the form of an exact specification of the 
uncertainty safeguard of the form P; = « will be clear if we state the foregoing rules in another 
way. In effect, rule (i) signifies that we propose to disregard all samples if x < as and rule (ii) 
that we shall consistently disregard samples if x < as. We can get a backstage view of their 
implications, if we determine the proportion of excluded samples, i.e. the true uncertainty 
safeguard prescribed by each rule for values of s in the neighbourhood of 200, when a = 0:05 
defines the upper limit of acceptability for our uncertainty safeguard and the sample score is 
x = 10, For s= 199, 200 and 201 respectively «s = 9-95, 10 and 10-05. By rule (i) we dis- 
regard samples whose scores are 9, 10 and 10. The exact probabilities (Py) of doing so are 
respectively 


ae ae Fee mA 
9 


199 > 00 > 01: 
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By rule (ii) we disregard samples whose scores are 9, 9 and 10 with probabilities 


197 3 Zoo 3 20T 
Thus the values of P, for s in the neighbourhood of 200 are 
S Rule (i) Rule (i) 
199 = 0:045 = 0-045 
200 == 0-05 = 0:045 
201 = 0-0497 = 0:0497 


Rule (i) will make P, = 0:05 = « when s is an exact multiple of 20 = «a~t; but otherwise 
P; <a. Rule (ii) makes P, nearly equal to « when s is an exact multiple of 20 but always less 
than «. 

The fatal fascination of the continuum to the theoreticians of statistics arises from the 
circumstance that such inequalities do not trouble us; and it is instructive to examine the 
consequences of invoking the continuous rectangular variate in this situation. To define a 
rectangular distribution as a continuous variate we have to satisfy two conditions: (a) the 
probability f (x)dx that a score lies in the range x + ¿dx is constant for all values of x, 1.e. 
f(x)dx = K.dx; (b) the complete integral is numerically equal to unity, i.e. 


K|'dv=1=K(—1), 
1 


The probabilities that the score lies in the range from 1 to k or k to s are then exactly 


A k— 1 
~~ == ee 
P(x <hk) > a dx - 
I p s—k 
= = > = ——, 
Pe 210 : al dx ee 


We cannot split a range of whole number score values from 1 to s into two regions x <k 
and x >k as we can if x is a continuous variate ; but we can often get a good approximation 
to a sum by integration if we invoke the half interval correction. We then interpret x > k to 
mean x > (k + $), so that 


: s—k—4#4 
Pesna ||, des s ES 
2s — 2k — 1 
E a — 
(eS) 5 
To make P(x > k) = 1 — a, we have 
2k — 1 
~ a ers į 
dia g and k=a(s— 1) +2. 


The statement x > a(s — 2) + 4 is equivalent to 

x+o—jik 

a 

If « = 0-05, this is equivalent to s < (20x — 9); and when x= 10, s < 191 with P, = 0-05. 


The value of « consistent with x = 10 and s = 191 prescribed by rule (i) above is « = 10 — 191 
= 0-0523; and the rule states that P, <a, Le. P; < 00529. 


s< 
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We did not have to face the issue last discussed in the context of Model I (b), because we 
invoked a normal approximation for the summation of the terms of a truly discrete binomial 
sample distribution. It is therefore instructive to re-examine the foregoing model situation on 
the assumption that the score x is a continuous rectangular variate. We may then interpret 
x>kasx>(k—4)andx<kasx<(k+4). To accommodate all discrete values in the 
range x = 1 to x = s inclusive we must accordingly extend the range of the continuous distribu- 
tion from x = 4 to x = (s + $). On this understanding, our formal definition of the continuous 
rectangular distribution has merely to satisfy two conditions: (a) the probability f(x)dx that a 
score lies in the range x + ¿dx is constant for all values of x, i.e. f(x)dx = K.dx: (b) the 
complete integral is numerically equal to unity, i.e. 


s+4 1 
K| dx = l= Ks and ES. 


to 


The probabilities that the score lies in the range from 1 to k or beyond k are then expressible as 


qee k k 
| de == and Ple >R)=1-=-. 
SJ3 S S 
The above statement is exactly true of the discrete distribution, since P(x < k) = P(x < k +4) 
if x is necessarily an integer. In effect, we make our range from ¿Ax to s + ¿Ax, since Ax = 1; 
and we may neglect Ax if s is very large, as we must assume if we invoke the continuous distribu- 
tion as a descriptive device. We shall then say that the range is from 0 to s, and admit fractional 
values of x consistent with the specification 
1 k 
Pack) == =|. el 
Accordingly, we now proceed on the assumption that x can have any real value in the range 0 to s. 
To make P(x > k) = 1 — a we then put k = sa, so that 


P(x > sx) = 1 — a. 


Within the framework of the rule implicit in the poeira we then assign (1 — «) as the 
probability of correctly asserting 
e =. 
A 
When « = 0-05, this is equivalent to assigning P; = 0-05 to the assertion that s lies within the 
range from 1 to 20x. 

We have hitherto confined our attention to a procedure which entitles us to assign to s an 
upper confidence limit with an uncertainty safeguard P; <<a. If we wish to place it with a 
preassigned uncertainty safeguard in an interval ax > s > bx, the form of statement we may 
make is no longer unique. If we may justifiably proceed on the assumption that we can assign 
an exact uncertainty safeguard P, = y to what assertions we do make within the framework of a 
prescribed rule of procedure, i.e. that we may legitimately rely as above on the continuous 
distribution, we may write 


Pe <x <m)=!| aa e 


k S 
If we now write k = fs and m = as 


Pps =< 05) =P. 
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We then assign an uncertainty safeguard P, = 1 — (a — £) to the assertion 


x x 
ie ee age og 
a 


p 
If = 0-025 and « = 0-975 so that P, = 0-05 our final statement will thus be 


40x 
Ben 
40x > s 39 
Now P; = 0:05 if 8 = 0-01 and « = 0-96. We are therefore entitled to assign P, = 0-05 as 
the uncertainty safeguard to the alternative assertion 


29% 
24 ` 

When we write P(x > sax) = 1 — a or P(Bs < x < as) = ua — B, we state the probability 
of an event, i.e. value of unit score x, within the framework of the classical theory of 
probability and the convenient fiction that the distribution is continuous. Our assertion 
signifies : for the fixed value s of the relevant parameter, P,., is the probability that the unit 
score will lie in such and such a range. We have refrained from writing the probability we 
assign to the equivalent assertions in the notation 


100x > s > 


Pis < xot) = ta or PE '*s >: NB 


lest we should hastily interpret them in terms of inverse probability, i.e. as if we could 
legitimately say : for the fixed value x of the unit score, P,., is the probability that s will lie in 
the specified range. Such a form of words is inconsistent with Neyman’s theory. We must 
interpret a statement in the form P(ax > s > bx) = y as a summary of the long-run result of 
consistently adopting one and the same rule of conduct regardless of the value (e.g. x = 5) the 
score x may have in any single trial, including the particular trial to which our specification of 
the interval estimate is referable. The formal statement of the rule will be adequate only if it 
explicitly specifies x as an unknown which may assume any value within its admissible range. 
We misinterpret it if we condense our verdict in such a form as 
200 


P( 200 > SS > == O99. 


This is an act of self deception into which we easily slide, if we write the formal indentities : 
Bi => ah) == e = as 
x ee Bas oe! et 
h hid. "Mi TE 
x . dh 
ee 


We have now eliminated any reference to x as a variable in the expression on the left and have 
obtained on the right what is seemingly the element of a probability distribution and satisfies 
the fundamental property of the latter, if we fix x and define the range of s from h = x toh = œ, 
so that 


P(h + dh >s > h) = 


l h-* . dh = 1. 


x 
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This step, which leads to what Fisher calls a fiducial probability distribution, is admissible 
only if we can legitimately confine our statements to situations in which x has one and the same 
value (e.g. x = 5). We could then write 


k pr 
Ps <h) =| o *. 


T 

If k = 20x, we thus obtain by a somewhat circuitous route a result already derived within 
the framework of the assumed continuous rectangular distribution, i.e. P(s < 20x) = 0-95. 
The numerical consistency of many—though not all —results embodied in Fisher's approach to 
interval estimation with those to which the theory of confidence limits leads us did indeed at one 
time blind many statisticians to what we now see to be a radical difference. If we conceive 
x . f (h)dh as an element of a probability distribution, we have to regard h and x as independent 
to arrive at a numerical result consistent with confidence theory in the continuous domain ; but 
we can do so only if we then treat x as a constant in the algebraic manipulation. We thus 
implicitly fix our interval in terms of a preassigned value of x to arrive at the specification of a 
probability dependent thereon; but this is inconsistent with the programme of Neyman’s 
theory which specifies the interval in terms of a preassigned probability independent of the 
outcome of any single trial and hence of any preassigned value of x. 

We come to a parting of the ways, when we ask questions involving joint distributions, such 
as that of the so-called Behrens test invoked to estimate the difference between the means of 
different normal universes. In the confidence theory of interval estimation, we must first 
specify the composite distribution of relevant composite score values referable to all possible 
values of each variate, e.g. that of the r-fold sample mean difference d,, , referable to all possible 
values the sample means (M, s and M, ,) may assume. The prescription of the Fisher 
school derives a composite fiducial distribution from the particular fiducial distributions of 
M,.,and M, ,. Unless the variances c,? and o,? of the parent unit sample distributions are 
equal, the two procedures lead to different results ; and this has provoked a lively controversy 
conducted with an output of heat disproportionate to the illumination conferred.* 


20.07 ESTIMATION AND THE BAYES’ DILEMMA 


We have hitherto regarded the problem of estimation as that of assigning a probability to the 
truth of the assertion that some definitive parameter of a homogeneous universe lies between speci- 
fied limits. So stated, the issue sidesteps the disquieting dilemma with which the balance sheet 
of Bayes confronts us. Bayes’ theorem is essentially about a stratified (heterogeneous) universe, 
e.g. a bag in which some pennies are unbiased and one penny (through a defect of minting) 
has the King’s head on both sides. In effect, it says: to know how often I should be right 
in judging a coin taken from the bag as the one defective coin after getting 10 successive heads 
in a single 10-fold toss, I must also know how many other coins the bag contains. The dilemma 
to which the theorem draws attention is that we rarely have such knowledge ; but it is one which 
the theory of confidence sidesteps. The confidence theory of interval estimation deals with 
a Bernoullian universe, e.g. a bag containing pennies all of the same sort; and formulates 
how much we can say with propriety about the behaviour of such pennies on the basis of a 
single 10-fold sample. As we have seen, it relinquishes the attempt to assign a so-called best 
value for the long-run frequency (p) of heads as the result of tossing any one of them in pre- 
ference to an exact statement in terms such as the following : though I cannot say what is the 


* Behrens, W. V. (1929), Landw. Jb., 68, 807. Fisher, R. A. (1935), Ann. Eug. Lond., 6, 391. Sukhatme, P. V. 
(1938), Sankhya, 4, 39. Neyman, J. (1941), Biometrika, 32, 128. 
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correct or best value of p, I can tell you within what limits p will lie if you will agree to let me 
be wrong not more than aN times (e.g. 0-05) in N trials when N is very large. It is the writer’s 
belief that Neyman (1934) does not overstate the novelty or the importance of the viewpoint 
explored in 20.06 when he declares : 

The solution of the problem which I described as confidence intervals has been sought by 
the greatest minds since the work of Bayes 150 years ago. Any recent book on the theory of 
probability includes large sections concerning this problem. The present solution means, I think, 
not less than a revolution in the theory of statistics.—(F. Roy. Stat. Soc., Vol. 97, p. 536.) 


We shall now examine model situations which suggests an alternative more sophisticated 
approach to the problem of confidence to clarify what is common to the domain of decision tests 
and the domain of estimation. It is also of special interest for another reason mentioned at the 
end of 20.03 above. ‘Till recently, it has been common to assume that an adequate theory of 
statistical decision must come to terms with the prior probabilities of Bayes’ theorem. This 
belief leads to an impasse unless we are content to embrace the highly exceptionable postulate 
mentioned in 20.03 ; but it rests on a debatable assumption that the model situation with which 
Bayes’ theorem deals is factually relevant to statistical decisions involving no more than one 
hypothesis referable to an existent population at risk. We can best see the irrelevance of the 
theorem to the issue of estimation if we : (a) provisionally postulate a model situation to which 
it is indeed factually relevant ; (b) formulate a procedure which is valid for all conceivable values 
of the prior probabilities and therefore to the limiting case when there is only one urn. The 
universe of our models of this section will be a stratified universe, and our problem to attach 
an acceptable uncertainty safeguard to the assertion that a parameter definitive of the single 
stratum from which we take a particular sample lies within a specified range. 

Model IIa. With this end in view we shall suppose that someone spins 40 times one of 
100 lottery wheels chosen at random. Each such wheel has 1024 sectors like the wheel of our 
first model in 20.06 with scores of x, x + 1, (x + 2), . . . (x + 9), (x + 10), allocated respec- 
tively to 1, 10, 45, . . . 10, 1 sectors. ‘The recorded score of the 40-fold spin is again 6:3, 
and we do not know the value of x associated with the particular wheel selected for the spin. 
We do know, however, that each wheel is one of eleven types as follows : 


Type No. of wheels Value of x 
I 1 0-5 
II 3 0-6 
III 10 0-7 
IV 17 0-8 
V 20 1-1 
VI 7 1-3 
VII 12 1:5 
VIII 3 | 1-8 
IX 8 1-9 
X Z 2-0 
XI 17 2-1 


In this model set-up, we may construct 11 admissible hypotheses about the value of x, 
and hence of the expected mean M = (x + 5). For each hypothesis the standard deviation 
of the distribution of the observed mean (M,) of the 40-fold spin is om = 0-25, and to each 
hypothesis we can assign a prior probability in Bayes’ sense. From this point of view, the 
relevant information is as follows : 
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Hypothesis Prior Probability M (M — M,) = on 
I 0-01 5:5 — 3-2 
II 0-03 5:6 — 2:8 
HI 0-10 5:7 — 2:4 
IV 0-17 58 — 2:0 
V 0-20 6-1 — 08 
VI 0-07 6-3 0 
VII l 0-12 6-5 + 0:8 
VIII 0-03 6-8 + 2:0 
IX 0-08 6-9 + 2-4 
X 0-02 7-0 + 28 
XI 0-17 Ti +32 


We shall now make in the following rule. We shall reject some hypotheses as inadmiss- 
ible and reserve judgment on others which we shall accordingly regard as admissible, applying 
to each hypothesis the same criterion of rejection, i.e. that it assigns to the deviation of the 
observed score (MW, = 6-3) from the expected value (M) prescribed by the particular hypothesis 
a value numerically greater than 2c,,. We then reject all hypotheses except IV-VIII inclusive, 
and are left with the assertion that M lies in the range 5:8-6:8 corresponding to values of x from 
0-8 to 1-8. 

Our uncertainty safeguard for rejection of every hypothesis when true is 0-05 since our 
rejection criterion is modular. That the unconditional uncertainty safeguard for the final 
verdict is also 0-05 as for Model Ia, we may make explicit as follows. We first remind ourselves 
that we can falsely reject only one hypothesis, since only one can be true. ‘Thus the uncondi- 
* tional probability of a false verdict is the unconditional probability of falsely rejecting one or 
other of an exclusive set of hypotheses, and is therefore obtainable by recourse to the addition 
rule. If P, is the prior probability that the particular hypothesis H is applicable to the 
situation, i.e. that we chose at random a wheel of type H to spin, the probability of falsely 
rejecting it is «P, ; and by definition 


The probability of making a false decision is the probability of falsely rejecting any one of the 
hypotheses, 1.e. 
h=11 


pati 
y PAR: A e en 
hel hw 


Thus a is our uncertainty safeguard to the assertion that M, lies within the. prescribed limits ; 
and the prior probabilities of Bayes do not affect its value. We arrive at exactly the same result 
as for the corresponding situation (Model Ia) of 20.06, where we set the same uncertainty 
safeguard to the same range of admissible values of the parameter x of one and the same 
wheel. 

Should the reader find the last step of this argument difficult to follow, it may be helpful 
to set it out in the form of a truth table for which it will suffice to predicate only 4 hypotheses 
with prior probabilities P,, Pa, P;, P,, and corresponding definitive parameters M,, M,, etc. 
Each hypothesis corresponds to a fictitious possible universe, which is a wheel of a given type 
in our model situation. The hypothesis we deem to be applicable to the situation is the actual 

16 
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universe from which our sample comes, i.e. the particular wheel of which we record the outcome 
of a 40-fold spin. We may then set out the procedure in stages as follows : 


Prior Probability of Probability of Rejection as such Probability of Retention as possibly such 


choosing the wheel (x) (1 — «) 
P, Pio Pi(1 — a) 
P, Pao P1 — «) 
P, P30 P1 — a) 
Fy Po P,0 — «) 
Total aP, + Pa + P} + Py) =a (1 —a)(P, + Pa + P} +4 P) = (a) 


In this table each cell entry of the second column records the probability that we shall both 
choose a particular wheel to spin and reject the conclusion that we have done so. Each cell entry 
on the right records the probability that we suspend judgment. ‘The grand total of all the 
cell entries is a + (1 — «) = 1. Thus our decisions are classifiable exclusively and exhaustively 
as either definitely false or uncertain, and we may interpret our balance alternatively in terms 
of probabilities assignable to our decisions as follows : 


Decision 
Hypothesis 
False Non-committal 
I P.x PI bza 0) 
II Pa Pi — «) 
| III Pax PL —u) 
| 
IV Pa P,i — «) 


Total a 1—« | 


In the set-up of this Model, we regard any one of a limitless number of values p may have 
as a hypothesis referable to a conceivably, but not necessarily, existent population at risk. We 
thus interpret the process of estimation as a method of screening an exhaustive set of hypotheses 
as admissible or otherwise by successively applying to each a test prescribing the same probability 
of rejection if the hypothesis is indeed true. Our universe of hypotheses so conceived is a strati- 
fied universe, in which strata with the same definitive parameter P, provisionally constitute 
an existent population at risk with an assignable finite prior probability in the jargon of Bayes’ 
theorem. Bayes’ prior probabilities (P,) are then relevant to the initial formulation of the 
problem ; but they do not appear in the solution. Consequently, we are free to assign to the 
prior probability of any single hypothesis any value in the range O to 1 consistent with the 
restriction that the sum of all the prior probabilities is unity. Whether there corresponds an 
existent population to a particular hypothesis in our fictitious stratified universe is therefore 
immaterial, That a particular hypothesis to which we apply the test corresponds to no existent 
population merely means that P, = 0. To conceive the universe as unstratified is to assign 
P, = 1 to one stratum and P, = 0 to every other one. In this sense, Model I is therefore a 
limiting case of Model II. 
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This way of looking at the problem of estimation makes the distinction between the domain 
of test decision and that of estimation less clear-cut than the alternative ; but we should not lose 
sight of what remains. If we perform a decision test to arbitrate simultaneously on the merits 
of alternative hypotheses which constitute an exhaustive set our rejection criterion or criteria 
determines which we accept and which we reject ; and we can never assign the same probability 
of rejection if true to more than 3 hypotheses on this understanding. If we interpret the procedure 
of estimation in terms of the model of this section, we can regard it as the performance of a 
battery of tests, but the score value which defines the criterion of rejection is different for each 
test and the decision to reject any one hypothesis or group of hypotheses does not prescribe 
acceptance of any other single hypothesis. We successively apply to each a test involving a new 
value of the score deviation (x — M) as the criterion which ensures the same probability of 
rejection for each hypothesis when true. If we assert that one group of hypotheses constitute 
an admissible in contradistinction to a residual group as an inadmissible set, we then do so on 
the assumption that one of the former is identifiable with the correct one. 

Model II (c). In the homogeneous universe of Model I, we have seen that we can set an 
upper limit (P; <a or P; < a) to the uncertainty safeguard we attach to a confidence boundary 
in the domain of discrete score values; but we cannot make an exact statement of the form 
P;=«. Let us now therefore look at the problem raised by Model I (c) of 20.06 as one of 
sampling in a stratified universe. We shall postulate as below an assemblage of 100 lottery 
wheels of 12 types with consecutive scores 1 to m inclusive if s = m is the number of sectors 
of a wheel of type H. Thus we have 12 hypotheses about s to explore, each referable to an 
existent population at risk ; and we shall once more limit our decisions to rejection and reservation 
of judgment. We know the score x of a single spin without knowing the type of wheel to which 
it is referable. Our problem will be to assign a probability to an admissible set of hypotheses. 


Type of Prior Probability 
Wheel No. of Sectors No. of Corresponding of Chotce 
(H) (Sn) Wheels (Np) (Pa = N, — 100) 
1 5 13 0-13 
2 19 2 0-02 
3 20 1 0-01 
+ 21 3 0-03 
5 39 7 0-07 
6 40 12 0-12 
7 99 3 0-03 
8 100 4 0-04 
9 101 9 0-09 
10 £99 10 0-10 
11 200 15 0-15 
12 201 21 0-21 
Total 100 1-00 


For Model I (c) we formulated two rules 


Rule (i) s <Ž with Py <a; 
Xx 


16% Rule (ii) s <7 with P; <a. 
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In effect, the first rule states that we reject the hypothesis s = s, unless x > as, ; and the second 
states that we reject the hypothesis s = s, unless x > asp. Thus our rejection criteria are 


Rule (i) Reject if x <as with P; <a; 
Rule (ii) Reject if x < as with P,< a. 


As below, we may then draw up a table of verdicts based on each of the foregoing rules 
for different experiments in which x = 5 and x = 10 respectively. In each case we assume 
that « = 0-05 is an acceptable level of uncertainty. 


Hypothesis a7 Criterion a Gs. E A eee 
(h) (s») (as, =0:0552) | Verdict by Verdict by Verdict by Verdict by 
Rule (i) Rule (ii) Rule (i) Rule (11) 
1 5 0°25 Open Open Open Open 
2 19 0°95 Open Open Open Open 
3 20 1-00 Open Open Open Open 
4 21 1-05 Open Open Open Open 
5 39 1-95 Open Open | Open Open 
6 40 2°00 Open Open | Open Open 
7 99 4-95 Open Oren | Open Open 
8 100 5:00 REJECT Open | Open Open 
9 101 5-05 REJECT ne O 
10 199 9-95 REJECT REJECT | Open Open 
11 200 10-00 REJECT REJECT | REJECT Open 
12 201 10°05 REJECT REJECT | REJECT REJECT 


| 


The range of s values covered by open verdicts thus corresponds precisely with the outcome 
of our examination of Model I (c) for which the upper confidence limits are 99 and 199 respec- 
tively for x = 5 and x = 10 with P, < 0-05 (Rule i) or 100 and 200 respectively for x = 5 and 
x = 10 with P; < 0-05 (Rule ii). The meaning of the correspondence is evident if we recall 
the meaning of the true conditional uncertainty safeguard (P, . ,) of hypothesis H in the domain 
of discrete score values. If our criterion of rejection is x < «s, we exclude only samples whose 
score value is x = as when «s itself is an integer. Thus P, ,, the proportion of excluded score 
values when hypothesis H is true, is the ratio to s of the nearest integer not exceeding s and is 
always less than or equal to « If 0 <e, < 1 we may thus write 


Fia = Ll — En, 


h=12 h=12 = 12 
A E > Pi fa > Pes E ae 
h=1 h=1 p= 
h==12 


Es P; = 4%. — > Pit 
h==1 
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Since we have chosen the rejection criterion so that P, , <a, all values of e, must be zero or 
positive. Rule (11) asserts that they are all positive, whence we obtain as for Model 1 (c), 


P, de 


In this instance, some values of e, are positive when we apply Rule (i) and others zero. 
Thus P; <a as before; but this is not inconsistent with the assertion P < « being included 
therein. A generalised Model II situation must take stock of the possibility that P, „= « 
for each wheel as would be true if we knew that the recorded score referred to a wheel of any 
one of types 3, 6, 8, 11 above. For each of these P, , = 0:05 and e, =0 as will be seen by 
citing the values of P, , prescribed by our rejection criterion, viz. : 


Sh ASh Rule (i) Rule (ii) 
5 0°25 0-0000 0:0000 
19 0-95 0:0000 0-0000 
20 1-00 0:0500 00000 
21 1-05 0°0476 0:0476 
39 1-95 0:0256 0:0256 
40 2:00 0:0500 0:0250 
99 4°95 0°0404 0-0404 
100 5:00 0-0500 0-0404 
101 5:05 0:0495 0:0495 
199 9:95 0:0452 0:0452 
200 10:00 0:0500 0:0450 
201 10:05 0-0497 0:0497 


In the treatment of Model I (c) we have already recognised one reason for regarding the 
concept of fiducial probability as an inadequate basis for a theory of statistical inference in that 
it restricts the field of discussion to continuous variates. Further consideration of the model 
situation we have last discussed gives us an opportunity for contrasting two theories of interval 
estimation from a different viewpoint. Fiducial probability takes its origin from assumptions 
common to the theory of confidence ; but Neyman’s development of the latter is inconsistent 
with Fisher’s interpretation of the former, unless there is some sense in which only one admissible 
preassigned rule of test procedure is appropriate to one and the same situation. Models I (c) 
and II (c) do indeed refer to a situation in which only one such rule invites our attention as 
relevant to the end in view ; but we have not excluded the possibility that more than one might 
each have seemingly equal claims to commend it from a purely formal viewpoint. We shall 
now examine a situation in which this dilemma arises. 

Since the issue has special relevance to the concept of fiducial probability, we shall postulate 
a continuous rectangular distribution over the range 4 to s + 4, and examine what statements 
we may make when we draw two unit samples with scores x, and xẹ Two, though not the only 
two, rules which we may formulate will serve our purpose well enough for heuristic purposes. 
We shall alternatively seek to prescribe an upper confidence limit to s with an uncertainty 
safeguard « by recourse to 


(1) the Migs DESEO x, being 4, =% bay > ay amd 24 4, if 2, >-x,-: 

(11) the score sum x, = x, + Xp. 

The probability that x,, < m is the probability assignable to the joint occurrence that each 
score lies in the range from x = 0 to x = m inclusive, i.e. 


m? m? 
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We wish our final assertion to take the form s < kx with a probability (1 — a) of correct assertion 
if we consistently follow the test procedure, whence we write P(x, > m) = (1 — a), 


2 


ma and Pltq > sv/a)=1—4, 


Within the framework of this rule, we then assign « as the uncertainty safeguard to the assertion 


Xm 


s< s 
a/a 


If we base our test procedure on x,, defined as above, the reader unfamiliar with the continuous 
rectangular distribution will find it helpful first to make a simple chessboard diagram of the 
2-fold discrete score-sum distribution. It is then evident that we may express the probability 
that xı» lies in the range 2 to k if x = 1 is the origin of the unit score distribution in two ways : 
Ls AN G 


25? 


- be D 


25? 


P(g >>-R) when hk >-s.- 1: 


P(X 15 > k) =el when k<s+l1. 


For the continuous case we may represent our chessboard geometrically as a rectangle of area 
s? and the region in which all values x,, < k lie when k < s as a triangle of area 4k?. Since 
we wish to associate a probability (1 — a) near unity to the truth of the assertion s < k-!. x, 
our concern will be with the smaller value of k. For the continuous case we then write 


k2 


— =] — a 
2s? i 


P(x, > k) = 1 — 


Pla > SV 20) = Te 


Our second rule thus assigns « as the uncertainty safeguard to the assertion 


We thus have two rules which assign different values to the upper confidence limit of s at 
one and the same confidence level (1 — a). In the strictly behaviourist formulation of confidence 
theory by Neyman this involves an inconsistency only if both rules incorporate all the information 
about the unknown parameter the sample can supply. One rule may be better than another, 
if it incorporates more information ; but its use may have drawbacks which outweigh its merit 
on that account. In Fisher’s theory of interval estimation no such freedom of choice is admiss- 
ible. ‘The avowed intention of the concept of fiducial probability is to express the intensity 
of legitimate conviction referable to a particular sample. If so, only one rule can be right, 
namely the rule which invokes all the information the sample supplies. Fisher speaks of a 
Statistic, i.e. sample score, which has this property, as the sufficient one. 

The two statistics x,, and x,, used in the foregoing situation will serve to illustrate what 
is and what is not a sufficient statistic in Fisher’s sense of the term, if we now consider x, and 
x, as unit samples from a discrete rectangular universe with a range of scores from 1 to s inclusive. 
In deriving a rule on the basis of either we have suppressed any explicit specification of x, and x». 
If our chosen statistic is defective it can be so only for that reason. We shall therefore ask : 
have we lost anything by withholding such information ? We may answer this by considering 
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the consequences of confining our attention in a sequence of trials to samples with some pre- 
assigned value of Xm Or Xp». 

Let us first suppose that the preassigned value of x,,== 3. The different sorts of double 
samples that are consistent with this value occur with equal frequency and are specifiable as 
follows: (1,3); (2,3); (3,3); (3, 2); (3, 1). This set of equally frequent values is the same 
for all values of s consistent with the specification x,, = 3. Thus we have suppressed no informa- 
tion about s by scoring our sample in this way. Is the same true of xı? Let us now consider 
samples w.r.t. which xia = 8. This specification is consistent with any value s > 4, but this 
condition does not suffice to specify what individual values x, and x, have. If s = 4 the only 
double sample consistent with the specification xı = 8 is (4, 4). If s = 5, three paired score 
values are allowable : (3, 5) (4, 4) (5, 3). If s = 6 we may have: (2, 6) (3, 5) (4, 4) (5, 3) (6, 2). 
Thus we can say more about s, if we know the individual score of x, and x, than we can if we know 
only the value of the insufficient statistic xı» ; but the individual values of x, and x, tell us no more 
than we already know, if told the value of the sufficient statistic x. 

We have now to state the definition of a sufficient statistic formally. To do so we first 
remind ourselves that to each 2-fold sample specified in terms of the sequence of unit samples 
we may assign as above a bivariate score, e.g. (3, 5) or (5, 3). We may then speak of P,z.s as 
the unconditional probability that any sample has the bivariate score (xı, x.) and Pi». as the 
conditional probability that it has this score if x,, is the maximum score. In the same sense, 
we may label the unconditional probability of a multivariate score (x1, Xa, Xz, . . . Xr) definitive 
of an r-fold sample as Py.9.3...7).» for a distribution whose definitive parameter is p and 
Pa.2.3...r.x aS its conditional probability when the sample statistic is x, if we can define it 
from our knowledge of x alone. We may then define by P,., the probability that the sample 
statistic will be x if the parameter is p and obtain by recourse to the product rule 


O A ce 


We have now split the unconditional probability of getting the bivariate score which summarises 
all the information the sample supplies into two factors one of which is independent of p if the 
statistic is sufficient, i.e. if (as is true of Pis.m) we can specify it without knowing the value 
of the universe parameter. We thus take as our formal criterion of a sufficient statistic the 
resolution of the probability of the multivariate score into two factors of which one does not 
contain p. 

By recourse to a simple chessboard lay-out of s? cells with border scores from x = 1 to 
x = s inclusive and x = 1 to x = s inclusive we may amplify this breakdown w.r.t. x,, and x, 
for the discrete rectangular universe. Each cell of the grid is referable to a unique pair of values 
x = x, and x = x, but the same value of t = (x, + x,) or of Xm = m is assignable to more than 
one cell if + = 2, Cells specified by x,, = m lie on two sides of a square of m cells, there being 
(2m — 1) in all. If we write P,.., for the probability that the sample records the unique pair 
of score values x, and x, when the number of sectors is s, P,».,, for the probability that 
x has these two values when x,, = m and P,,., for the probability that x,, = m when s is the 
number of sectors, we thus see that 
1 eos 2m L 


a one E y? 


1 
Pi. r 


52 5 Pio. m e 
Hence in accordance with the product rule for conditional probabilities 
Pda — LL 12.m- ves 


We have thus split the probability assignable to the bivariate score x,, x, into two factors one of 
which (P;,».,.) is independent of s; and we might be tempted to think that we could specify 
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a corresponding identity P,.., = Pi..,.P;., referable to the probabilities of getting the score 
sum ¢ when there are s factors and getting the particular value of the bivariate score if also 
(xı + xə) has the particular value t. Actually we cannot do so. All samples such that 
(xı + x2) = £ lie in a diagonal of (t — 1) cells if s > (t — 1); and if we knew this we might write 
Piz.+ = (t — 1)* which is again independent of s. Thus there will be 4 cells in the diagonal 
corresponding to £ = 5 if s > 4; but there will be only 2 cells in it if s = 3. Given £ we can 
say that s > 4t, e.g. s > 2 if t = 5, but we cannot say that s > (£— 1). The mere fact that 
t = 5 is therefore insufficient to assign a unique value to the conditional probability P,». +. 

In the same sense we may speak of the number (x) of successes in an r-fold sample from 
an infinite 2-class universe as a sufficient statistic of the parameter p. We may denote by 
P(1.2.3...7)y the probability that the sample records successes and failures in a fixed order, 
there being 7;,, different samples so distinguishable for the particular value x. Thus we may 
write 


1 
Picca Ni a E eo Poa ee 


V(x) 
Pate cece a Ps a oe 


One circumstance which gives the concept of sufficiency a peculiar importance vis-d-vis 
Fisher’s approach to the problem of interval estimation is that it is not always possible to specify 
a sample by a statistic which is sufficient in his sense of the term. Since the fiducial probability 
distribution is in his formulation referable only to sufficient statistics and only to sufficient 
statistics themselves referable to continuous distributions, the fiducial theory of interval estimation 
is of much more limited application on its own terms than is Neyman’s theory of confidence. 


20.08 THE CLAIMS OF SMALL SAMPLE THEORY 


En passant in 20.03 (p. 859) and more explicitly at the end of 20.04, we have had occasion to 
emphasise an essential difference between the Yule-Fisher interpretation of test procedure in 
terms of significance and the Neyman-Pearson-Wald approach in terms of decision. More 
explicitly, we may distinguish between a test procedure of the latter type as one designed to 
give a yes-or-no answer and an alternative prescription of which the only decisive outcome with 
an assignable uncertainty safeguard is the negation of a particular hypothesis. Purists of the 
Yule-Fisher school may therefore say that their test prescription excludes the possibility of 
making an error of the second kind. We shall now seek to clarify the limitations this renunciation 
imposes on the laboratory or field worker. | 

For three decades, we have learned to think of economy of sample size as a prior desideratum 
of test procedure and to envisage their applicability to small samples as the supreme merit of the 
type of significance test dealt within Chapter 17. Whatever else of lasting value emerges from 
the Neyman-Pearson concept of test power, it is clear that we must now re-examine any such 
claims without confusing what is a purely algebraic with what is wholly a logical issue. Since 
the practical objective of a test procedure is to arrive at a decision of some sort, we must indeed 
distinguish between the adequacy of a statistical technique: (a) to assign a precise uncertainty 
safeguard to whatever positive assertion it entitles us to make with the minimum expenditure 
of effort; (b) to give a decisive answer to a particular question with the utmost economy of 
materials. No one could now question that the class of tests prescribed by the school of R. A. 
Fisher are economical in terms of (a), e.g. within the same framework of initial assumptions about 
the structure of the parent universe the t-test permits us to assign a more precise uncertainty 
safeguard than a c-test to the decisive assertion that the null hypothesis is false unless the sample 
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is very large. That this is so, does not dispose of a possibility of more concern to the research 
worker. We achieve no economy by using small samples if our test procedure can assign no 
uncertainty safeguard to the majority of statements it leads us to make, i.e. if the overwhelming 
majority of permissible verdicts consistent with the choice of a false one as our null hypothesis 
are unproven. 

Against the background of 20.04, we cannot disclaim the obligation to examine this possibility; 
and may do so without invoking ary sophisticated mathematics. It will suffice if we take a 
back-stage view of the mechanism of the significance test for a proportionate score difference, or 
so-called Chi-Square test for 1 d.f., in the taxonomic domain. We shall postulate p, and p, as 
the true success rates for two treatment procedures (A and B), denoting by p,.; and py. s 
the corresponding success observed rates for equal (r-fold) samples. If we write for brevity 
(Pe — Pa) = d and (p,.; — Pa .s) = ds, we may then define without appreciable error for sizeable 
samples (e.g. r = 50) and for values of p, or p, in the neighbourhood of 0'5, a square normal 
standard score of unit variance by the relations 


No. 2 
c? =s (d; 5 d) and o? dá Poda + Polo (i) 


Og E 


For heuristic purposes we may assume that we know the true value of pa = 0'5 for the yardstick 
treatment (A), in which event 


(1) 


If we adopt the conventionally prescribed null hypothesis d < 0, i.e. that treatment B is 
no better than treatment A, we may define for d = 0 our standard score in accordance with (1) 
and (ii) above as c, = d,\/2r. Whence for equal samples r = 50, co = 10d,. To assign an 
uncertain safeguard æ <0:05 to the rejection of the hypothesis, we must make c = + 1:64 at 
the rejection level ; and we shall then reject the null hypothesis only if d > + 0:164. 

That the probability of falsely rejecting a hypothesis which is in any event irrelevant to our 
practical aim does not exceed 5 per cent. will give us little reason for satisfaction if the same 
rejection criterion commonly leads us to an indecisive verdict when another—and maybe more 
relevant—hypothesis happens to be true. Taking a backstage view, we shall therefore suppose 
that we know treatment B to be at least 10 per cent. better than treatment A, i.e. the true hypo- 
thesis is d = 0-1. We shall thus define a standard score for d = 0:1 in terms of (i) and (i) 
when r = 50 as 


a), 
v0:98 
We then recall that we have decided to 1eject the null hypothesis (d <0) if d; > 0:164, i.e. if 
sa 
V0:98 
We are now in a position to answer the question we have raised above. We wish to know how 
often the conventional null hypothesis (d = 0) would lead us to suspend judgment when the 


truth is that d =0:1. This is simply definable in terms of the area of the normal integral of 
unit variance in the range from c, = — œ toc, = + 0°65, Le. 
1 fos 
| gta dc = 074. 
V 2r J —o 


or c, > + 0°65. 
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Thus the conventional null hypothesis test (at 5 per cent. significance level) will fail to result in a 
decision to reject it when the true operational advantage of treatment B is 10 per cent. in roughly 
79 per cent. of the situations we shall encounter ; and the choice of a more exacting criterion of 
rejection will merely worsen our plight. We can guarantee with higher frequency a decisive 
statement, i.e. rejection of the prescribed null hypothesis (d < 0) for the same conditions stated 
(d = 0 and pa = 05) with an uncertainty safeguard « < 0-05, only if we increase r thereby 
diminishing our rejection criterion d,. In short we can evade the sin of committing the error of 
the second kind only by incurring the risk of suspending judgment in most applications of the test 
procedure. If we wish to guarantee that 95 per cent. of our tests will lead us to a decisive con- 
clusion, we have to adopt the dual test procedure of equalising the two risks a= 0°05 = B) 
under a different name ; but we can do this only by prescribing sample size in conformity with the 
requirements of an admissible alternative hypothesis. 

The present position is therefore this. The several criteria of excellence (consistency, 
efficiency and sufficiency) claimed for now commonly used significance tests referable to a unique 
null hypothesis are economical only in a special sense with little relevance to the main concern 
of the research worker, i.e. to attach an uncertainty safeguard to a clear-cut decision. What is 
equally relevant to the contemporary revaluation of current practice is that the preference for 
sufficient and efficient statistics limits our choice of a null hypothesis to what a restricted range of 
sampling distributions can accomplish, and hence to the formulation of the null hypothesis in 
terms which may have no relevance to operational intention. Indeed, we may sum up the argu- 
ment of 20.03-20.04 by saying that the examination of small samples in terms of the now most 
widely used test procedures invoking a single null hypothesis will lead to answers of only two 
sorts—one non-committal, the other quite often immaterial. Unless we examine the situation 
through the spectacles of the dual test procedure, we have no guarantee that the test performance 
in the overwhelming majority of situations will not lead to a non-committal answer, i.e. to no 
answer at all. 


20.09 THE Show Must Go ON 


In the foreword to Vol. I of Chance and Choice the writer expressed the view that increasing 
demand for instruction in statistical techniques must encourage authoritarian attitudes in higher 
education unless there is a more lively recognition of the need for simple exposition of the logical 
assumptions inherent in the mathematical derivation of statistical procedures in common use. 
The new concepts discussed in this chapter offer the prospect of a radical re-examination of 
their credentials, and a challenge to which the conscientious teacher will respond. ‘The time 
for revaluation is overdue, and in one sense, therefore, it is a most encouraging sign of our time 
that the basic concepts of statistical theory are in the melting pot. Meanwhile, it has what 
may be discouraging consequences to the student confronted with contradictory assertions at 
the most elementary level in current text books. 

Until the publication of two recent books,* Wald’s (1947) Sequential Analysis and N eyman’s 
(1950) First Course in Probability and Statistics, much of the subject matter of this chapter was 
accessible only in highly abstruse mathematical publications. Inevitably, therefore, there are as 
yet few who have fully assimilated the logical implications of the new concepts discussed in this 
context. By the same token, the teacher who does not confine his or her theme to the exposition 
of one corpus of current dogma will find that his or her own views undergo modification under 
the impact of recent advances in statistical theory. This is conspicuously true of Kendall’s 
invaluable treatise. In a rapidly and otherwise hopefully changing milieu, inconsistencies of 


* Also Feller’s (1950) first volume of an Introduction to Probability Theory and its Applications. 
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statement in the writing of a book such as Chance and Choice have also been inescapable. For 
instance, the views expressed in 20.03 are inconsistent with that expressed on pp. 96-99 of 
Chapter 2 in Vol. 1; and the idiom of pp. 212-215 of Chapter 5 in Vol. I is not consistent with 
the more careful statement of the same issue in 20.05. 

By employing visual aids to exhibit the formal algebra of the classical theory of probability 
as in Vol. I and the derivation of the more widely used sampling distributions invoked by current 
statistical procedures as in this volume, the writer hopes that the outcome will be helpful to 
research workers hitherto hesitant to examine factual implications of statistical theory too long 
concealed by a facade of—to most of us—formidable mathematical operations ; and if the method 
of exposition proves to be useful to teachers who are themselves seeking to formulate the creden- 
tials of statistical theory in a less authoritative temper than that of their forerunners, the effort 
of completing a self-imposed task will not have been wholly fruitless. None the less, he retires 
from the stage with the misgiving that the issues raised in the controversy between the schools 
of R. A. Fisher on the one hand and of Neyman, Wald and E. S. Pearson on the other will in . 
retrospect appear to be far more challenging than the last few pages may suggest. 

In retracing our steps with Neyman to the milieu in which the Founding Fathers set out to 
frame rules for the division of stakes to ensure success to the gambler who pursues any such 
rule consistently, we can interpret the risk of erroneous judgment only in terms consistent with 
a forward look. The risk we specify as our uncertainty safeguard is the risk of error associated 
only with the entire class of statements subsumed in a rule which we must state in advance. The 
reinstatement of the classical viewpoint thus deprives us of any right to associate a probability 
with a particular statement referable to a particular event, least of all to an event in the past. 
What is less obvious and has been emphasised too little in the foregoing pages is that a rule 
meaningfully so conceived must assign in advance the number of unit trials per game, 1.e. the 
size of the sample. Thus the attractive algebraic properties which endow such distributions 
as the t- and the F- with a special claim to exactitude in one sense fail to confer on them the more 
exacting claims of a consistently forward look. 

If we adopt a behaviourist viewpoint to the proper uses of the algebraic theory of probability, 
we must repudiate any concepts referable to indefinable states of mind, restricting the terms of 
reference of the theory in the real world of human experience to long-run frequencies of events 
and statements about events in situations to which the stochastic calculus has a relevance endorsed 
by empirical evidence. To realise all the consequences of this reorientation, we shall then 
need to equip ourselves with a new vocabulary ; and much of the phraseology of this chapter 
will prove to be exceptionable if we undertake to reinterpret the legitimate scope of a calculus 
of judgments in an idiom consistent with the doctrine of chance in the setting of Pascal, J. 
Bernoulli and D’Alembert. | 

So far as we can now foresee, such a restatement will extensively restrict the licence to 
invoke stochastic considerations in situations of which we cannot with full assurance postulate 
randomness sensu stricto. It will also restrict the form our statements can take when we are 
dealing with choice on which we impose randomisation by recipe. If we follow to the bitter 
end the trail which the new American school has blazed, we may therefore have to relinquish 
many hitherto cherished illusions concerning what help statistical theory can offer to the research 
worker. Such, as yet but dimly recognised, implications of the reorientation in which they 
invite us to participate sufficiently accounts for the unpalatability of their views. For my part, 
I am content to express the hope that the reader will face the challenge; and, if convinced that 
a transvaluation of statistical theory is overdue, accept what limitations intellectual rectitude 
imposes on its claims with cheerful resignation. 


APPENDIX 1 
CHOICE OF SYMBOLS 


In this volume I have employed some symbolic conventions which the reader will not find in 
use elsewhere, and it is fitting to add a few comments thereon. In general, algebraic symbolism 
advantageously conforms to the following requirements : (1) it should be evocative in the sense 
that its form is suggestive of its meaning either by consistent acceptance of arbitrary conventions 
or by literal association with the meaningful content ; (11) it should be as explicit as necessary 
in the context without being more cumbersome than need be. Neither of these desiderata is 
realisable without compromise in certain situations. 

For instance, some writers use S,, as shorthand for the sum of n terms and SŽ for the sum 
of the squares of n terms, when the latter form at its face value would suggest the square of the 
sum of n terms. One might write S,,, or S,,, for the latter, employing q to indicate the quadratic 
term, thereby exposing oneself to misunderstanding at another level, since Sng OF Sy, might 
mean the sum of ng terms. The alternative S,. would be cumbersome both from a semantic 
and from a typographic viewpoint. In such situations, I have deliberately defined a symbol 
ad hoc (e.g. Sy, on p. 495), regardless of its evocative content or of established usage. 

As regards (i), I have deliberately used 7 and r respectively for size of universe and size of 
sample, because the student is familiar with this convention in the domain of permutations 
and combinations ; and I have used subscript notation more extensively than do many teachers 
of the elements of statistical theory because the reference is easy to recognise or to remember 
by recourse to the initial letter of the appropriate epithet. Thus M, for the true value of the 
universe mean and M, for a sample mean are self-explanatory, if one commonly uses the sub- 
scripts ¢ and s in this sense. For labelling cells (p. 409) of the chessboard diagram, I have not 
adhered to the matrix convention x,. for the score referable to the rth row and the cth column. 
Throughout this volume, the particular cell score Xi; signifies that of column 1 and of row j. 
The general term x,, means a score in a column of r cells and in a row of c cells. If this offends 
a purist, I plead in extenuation that : (a) the chessboard is not a matrix in the ordinary sense of 
the term ; (b) I have avoided recourse to matrix operations on the assumption that many readers 
with sufficient knowledge of differential calculus for understanding the book as a whole are not 
familiar therewith. Absolute uniformity is alas unattainable while current instruction condones 
sin" . a for arc sin. a in conformity with the standard convention for inverse operations, but 
sin” .a = (sin . ay”. Similarly, we still use antilog, . a where log,* . a would be more in keeping 
with general usage. 

With reference to (ii) above, I wish to emphasise that algebra endorses by general consent 
a compromise common to all communication. One cannot steer a middle course between being 
too vague and being too long-winded if one repudiates the right to rely on context where context 
suffices to clarify meaning. My extensive use of the dot subscript symbolism (e.g. E,.+,¿) to 
distinguish operations referable to different dimensions is an attempt to avoid ambiguities which 
arise by leaving too much to context, and is consistent with an established convention of multiple 
regression and partial correlation. Some such convention is indispensable as a steering device 
through the maze of symbols invoked in the treatment of regression, of non-replacement sampling, 
and of patterns of variance analysis involving more than 2 dimensions of classification ; but 
I have not hesitated to drop it when I regard its unwieldiness as a handicap in a sufficiently 
explicit context. | 

If I have sometimes used more explicit symbols than the context requires, it may perhaps 
have the advantage that it prepares the reader for their use when truly essential at a later stage. 
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For example, M(V,) for the mean of the variance of the distribution of scores within the columns 
of a 2-way grid should more explicitly be E,(V;.;) in conformity with the conventions con- 
sistently used in the treatment of a 3-way grid. Similarly, M; for the mean of the 1th column 
is more explicitly represented by M;.;; and when one actually substitutes numerical values 
for j and i in such context the more explicit convention is an indispensable safeguard against 
ambiguity. Some inconsistencies of this sort have arisen because the need for such safeguards 
was less apparent at an early stage than later in the course. Thus in some figures set up as 
wall charts at an early stage (e.g. Fig. 89) I rely on the order of the subscripts to convey the 
correct dimension of an operation without recourse to the dot notation. 


APPENDIX 2 
THE METHOD OF LEAST SQUARES 


Tue method of least Squares dates from Legendre and Laplace in the second decade of the 
nineteenth century ; but it was Gauss whose writings first familiarised physicists with its use. 
Gauss gives a derivation in a memoir written in classical Latin (1821) and, perhaps for that 
reason, the rationale cited in standard works on the combination of observations follows the 
line of thought of Laplace rather than of Gauss himself. As emphasised recently by Plackett 
(Biometrika, 1949), the theorem of least squares established by Gauss is substantially equivalent 
to the theorem commonly associated with the name of Markoff (1912). It does not presuppose 
a normal—or any other—distribution of errors, and it makes no appeal to the highly arbitrary 
axiom that the best estimate of the unknown parameter is that value which would assign the 
highest probability to the observations. 

The method itself takes its origin from the need for some agreement about how to weight 
different observations which lead to inconsistent estimates of an unknown quantity. Gauss lays 
down the principle that the preferable estimate of the latter shall be the unbiased estimate whose 
sample distribution has minimal variance. If there are many sets of observations the derivation 
of the estimator with this property involves many sets of equations. It 1s therefore impracticable 
to exhibit the proof as applied to a specific problem involving combination of observations in its 
most general form without recourse to matrix algebra; but the pattern of the proof is easy to 
illustrate by recourse to a situation in which only 3 sets of paired observations are available for 
the determination of a single parameter such as the so-called simple regression coefficient, or 
the slope constant (k) of a linear physical law, i.e. 


yakae ! ; ; ; . ; (i) 
We shall here assume that we have three observation equations each involving an error, 
E]; Eg OÍ €. 
y =k +C a 
Ya = kxa + C — ez 
Ys = haz +C — «3 


We also assume that the selected fixed values of x,, Xa, X are not subject to error or, what comes 
to the same thing, that errors to which y, and x, are subject are additive. On this understanding 
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we wish to weight y,, Ya, yg so that the derived estimate (k,) of k is unbiased with minimum 
variance. ‘Thus we initially postulate k, as a weighted average of y,: 


ks = Wy, + Ways + Ways . . . : : (11) 
If our estimator is unbiased E(k,) = k, | 
f= 3 
c WE) + WoE(y2) + W3E(y3) = k = = W,.E(y,) « - (i) 
In this equation 
W,E(y,) = W,kx, + W,C — W,Ele,) . > ; 0 


On the assumption that the true value of y, is the mean of an indefinitely large number of observed 
values y, . ,, the mean value of the error terms in (iv) is zero, 1.e. E(e,) = 0, so that 


W,E(y,) =< W,kx, + W,C, 


r=3 r=3 r=3 
- D WEy) =k Y Wax, +C W, 
y= 1 r=1 


r=1 


To satisfy (ii) and hence the condition that k, is an unbiased estimator we must make 


r=3 r= 3 
> W,x, =1 and > W, =0 ; ` ; ee ee 
r=1 r=1 
More fully we may write these equations as 
Wk, -+ WX + UE? a 1 and VW, + W, — W, == 0 . . (v1) 
Hence also W,xz + Wax + Wax. =0 = Wyxg + Woxg + Waaa, 
— 4 + E =e Si 
so that y. = 1 — Wis, + Wix; and W, => 1 — Wir, + Wix (vii) 
a ee Xg — Sg 


Equation (vii) thus defines the relations between the weights W,, Wo, W, in (11) and the 
chosen fixed values x,, Ya, #3 consistent with the condition that k, is an unbiased estimator of &. 
To say that it has minimal variance is to say that V,., = E(k, —R)? is as small as possible. 
Now we may define the sampling variance of V,., in accordance with (ii) in terms of the 
variances (o;) of the distribution of the values y,., of y associated with a fixed value (x,) of 
y. UL, 

Vio. s = Wiot + W¿oz + W202. 
Our assumption is that y,., = y, + €., i.e. the variance of Yr.s depends only on that (0?) of 


the error distribution which is independent of y, Thus o? = 0? = nee, 


e Vre. s = (Wi + WE + W2)c? . : y : . (vii) 
To choose our weights W,, W,, W, so that V,. , will be a minimum, we must satisfy the 


condition 
Wed BE ht PE 


i == == == : s ; , 
IW, dW, dW, i =) 
We thus obtain from (viii) : 
Wi wi  3W? 
— + — = : 
Ww, + VA ¿A i , i (x) 
It suffices to evaluate this for W, = W4, so that 
OW. dW, 
W. E E a i 
1 T Wa- W, e META (xi) 


We derive from (vii) : 


y _ 0- 


Oa xı + Wyx3)(%3 — 21) | 
"OW, (x2 — x3) E 
dW; _ CL — Wyx, + Wix) (x2 — %1), 
"OW, (x3 — X3)” 
On substituting these values in (xi) we get 
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(x2 — x3) W1 + (x3 — 011 — Wix, + Wyx 3) + (xa — 2) — Wixı + Wixa) = 0, 


e. 2W (x? + x3 + x3 — XXa — XiX — XoXg) = 2x, — Xa — Xz 
We can simplify (xii) by introducing M,, the mean value of x, so that 
3M, = X1 T Xa + %3 


Whence by substitution in (xii) : 


and 3(xı — M,) = 2x, — Xa — Xz; 
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(xii) 


3(x, — M,)? + 3(x, — Mo? + 3003 — MY? = 2(x? + 5 + x38 — XXa — X¡Xg — LoXg). 


W, = x, —M, 


(x, — M,) + (x, — Ma)? + (23 — Ma)? 
More generally for n paired values (y,, x,) we may write 


"AER, AR M, 
Sc — My 
Whence by substitution in (11) : 
+0 P Ma) 


We may express this relation in another form by using the substitution 


5 en. M, 
r=1 
In (xiii) above 


y lx, — M.) = (Yr — M,)(x, — M3) F M(x; s= M) and M, > (x, R M,) = 0. 
r=1 


Hence we may write (xiii) in the more familiar guise : 


> (y, EN M,Nx, sue Ma 
k, AMS 


'r=n 


(x, — Ma) 


r= 1 


(xiii) 


ry 
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(In compiling this index, key references only have been cited ; relevant cross-references are freely mentioned 
in the text.) 


Analysis of Covariance, 764, 771, 783 Correlation—cont. 
Analysis of Variance, 407, 410, 448, 450, 455, 532 and linear regression, 383, 388, 528 
Additive principle in, 554 multiple, 752 
confidence limits, 681 partial, 390, 483, 527 
degrees of freedom, 572, 682 product-moment, 344, 349, 353 
for one criterion of classification, 569 rank, 331, 350 
for two criteria of classification, 448, 533 ratio, 379, 441, 706 
for three criteria of classification, 455, 543, 560 tautologies of, 441 
interaction, 564 umpire-bonus model in, 360 
model I and model II, 548, 731, 752 universe, 476, 477 
replication, 556, 560 Covariance, 344, 360 
significance tests, 669 addition of, 459 
variance ratio, 655, 699 analysis of, 764, 771, 783 
Approximations, 45 Curve-fitting, 110, 580, 595 
for factorials of large numbers, 237 by method of moments, 232, 254 
in solution of differential equations, 48 and regression, 712, 746 


in summation, 54 


Decision test, 848 
Design of Experiments, 558, 783 
Difference distributions, 143, 153, 179, 183, 194, 
271, 290, 611, 825 
equation, 119 
Diophantine equation, 171 
Discriminant function, 759 
Distribution function, 230 
Double dichotomy, 880 


Bayes’ postulate, 198, 206, 853 
theorem, 195, 849, 897 
Behrens test, 897 
Bernoulli’s theorem, 133, 147 
Bernoullian universe, 510, 514, 626, 897 
Beta function, 229, 251, 256, 646, 650 
Binomial distribution, 28, 37, 110, 229, 295, 594, 
605 
histogram, 110, 115, 223 
Bivariate universe, 326, 360, 428, 475, 714 
Burette universe, 607, 620, 827 E-notation, 434, 453, 459 


Efficiency, 217, 294, 746 
c-test (critical ratio), 127, 128, 148, 187, 192, 203, Electivity, 75, 81, 91, 101 
213, 303, 313, 323, 691, 699, 906 Errors of first and second kind, 862 
Central difference, 116, 120, 230 Expectation fit, 581 
Chi-Square distribution, 217, 257, 263, 427, 612, 
644, 663, 665, 667, 670, 828, 833 


Classification, 60, 68 F-distribution, 653, 655, 699, 710, 740, 773, 783 
criteria of, in analysis of variance, 533 Factor Analysis, 784, 802 
manifold, 428, 804, 828 attenuation, 791 
Co-moments, 808 factor pattern, 785, 793, 795, 798 
Co-prime samples, 169, 340 hierarchical principle, 785 
Concomitant variation (see correlation), 326, 360, reliability, 791 
475, 482, 527, 528 saturation, 792 
Concurrence, 326, 369, 384, 389, 400, 747, 750, umpire-bonus model in, 484, 487 
784 Fiducial distributions, 897 
Confidence limits, 211, 213, 219, 695, 887 limits, 213, 219 
theory, 859, 885, 897 Figurate series, 13 
Consequence, 326, 369, 384, 389, 400, 748, 750, in sampling, 65 
784 in summation, 462 
Contingency grid, 429 Fixed A-set, 714, 750 
Correlation, 326 Frequency grid, 407, 429, 432 
in factor analysis, 784 histogram, 110, 581, 636 
grid, 358, 429 proportionate and relative, 101, 543 
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Gamma function, 229, 246, 251, 256, 589, 646, 660 
tabulation, 659 
Gregory’s formula, 33, 35, 52, 235 
Grid, 
contingency, 429 
correlation, 358, 429 
frequency, 407, 429 
independence, 439, 465 
regression in, 442 
score, 407, 410, 429, 448 
tautologies of, 428 
types of, 428 


Half-interval correction, 115, 116, 127 
Homogeneity, 453, 541, 897 

criteria of, 413, 543 
Homoscedasticity, 384, 480, 714, 721 
Hypergeometric distribution, 139, 230, 804 


Independence, 326, 331, 349, 635 
condition, 665 
grid, 439, 465 
Integration, 50, 234 
Interval Estimation, 886, 897 
Inverse probability, 199 


Leibnitz’ rule, 587 
Lexis models, 394, 508 
Likelihood, 197 


Maclaurin’s theorem, 35, 46, 234, 265 
Mean deviation, 231, 250 
Method of least squares, 713, 746, Appendix II 
Modular likelihood, 204, 340, 845, 863 
Moments, 229, 579 

as descriptive parameters, 231 

as gamma functions, 250 

derivation of, 582 

factorial, 591, 804 

generating functions, 264, 465, 582, 601 

of Bernoullian universe, 626 

of binomial distribution, 594 

of chi-square distribution, 833 

of difference distribution, 271, 611 

of distribution of the mean, 602 

of normal distribution, 595 

of Poisson distribution, 592 

of rectangular distribution, 593 

of score-sum distribution, 807 
Multinomial theorem, 39, 42, 79, 809 


Necessary and Sufficient Conditions, 528 
Non-replacement distribution, 137, 490, 814, 825 
Normal distribution, 115, 127, 139, 230, 250, 279, 

294, 317, 427, 595 
approximations to, 616 
Null hypothesis, 96, 105, 143, 153, 187, 205, 207, 
840, 859 


Ordinate fit, 581 
Orthogonal transformation, 520, 674 


Paired differences, 312, 630, 691 
c-test for, 313 
t-test for, 313, 704 
Pascal’s triangle, 25, 66 
Pearson system, 254, 427, 634, 646 
Pearson’s coefficients, 232, 254, 342, 593, 604, 634, 
821 
of binomial distribution, 605 
of burette universe, 607 
of Poisson distribution, 605 
of rectangular distribution, 605 
Poisson distribution, 136, 223, 403 
moments, 592 
Pearson coefficients, 605 
Posterior probability, 197, 852 
Prior probability, 195, 197, 206, 852, 867 
Probability density, 183, 636 
generating function, 465, 467 
integral, 127 
Probable error, 204, 659 


Quality Control, 842 


Rectangular distribution, 255, 261, 280 
moments, 593 
Pearson’s coefficients, 605 
Regression, 377, 528, 712, 764 
as standardising device, 767 
coefficients, 383, 388, 442 
equation, computation for, 726 
estimates, 728, 740, 756 
linear, 377, 289, 442, 485, 492, 501, 505, 527, 
765 
multiple, 752 
Rigour, 580 


Sampling, 58, 394, 428 
classified, 68 
distributions, 148, 153, 166, 535, 634, 642 
from different universes, 294, 625 
models, 59 
randomisation, 62, 94 
restrictive and repetitive, 71 
size, and significance, 218, 906 
size, as a source of variation, 517 
without replacement, 71, 508, 804, 814 
Score-grid, 407, 410, 430 
summarising indices, 448 
symbolism, 448 
tautologies, 448, 532 
three-dimensional, 452 
Scoring, taxonomic and representative, 194, 275, 
490, 494, 533, 536, 814, 828 
Sequential ratio, 849, 854, 880 
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Significance, 104, 105, 108, 195, 222, 840 
and sample size, 218, 906 
test, 105, 148, 187, 203, 275, 579, 634, 848, 906 
for analysis of variance, 669, 706 
for regression estimates, 734, 740, 759 
Small sample theory, 906 
Standard error, 304 
Standard score (see Critical ratio), 128, 526, 598, 
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